LexiVault workshop (led by Samantha Wray, the LexiVault Lead Developer)
LexiVault is an open-source, user-friendly web tool, developed as part of the SAVANT project, for querying annotated lexicons. It has been primarily developed for, but is not restricted to, the support of psycholinguistic research on low-resource languages. Psycholinguistically relevant measures from word frequency to phonological neighborhood density are readily available for well-resourced languages, whereas lesser-studied languages come with substantial overhead for the researcher to build corpora and calculate these measures from scratch. LexiVault aims to close that gap.
Currently the tool hosts lexicons of Tagalog, Bangla, and multiple Arabic dialects, with searchable annotations including part of speech tags, morpheme frequency, transition probability, and more, but we'd like to expand our offerings while helping you convert your bits and bobs of language data to a useable, shareable resource! This workshop is intended for those with any amount of corpus or behavioral data that they would like to process or annotate further for storage and usage on the LexiVault site.
The focus of this two-day workshop will differ from individual to individual depending on the starting state of your dataset and your interests, but could take the following forms:
- Automatic transcription of auditory data to create a text corpus from speech -stemming a text corpus to create a list of morphemes and their frequencies
- Part-of-speech tagging a text corpus
- Calculating minimal pairs and phonological neighborhood density from a text corpus. And finally, all paths lead to your resource being in a form you (and others!) can easily query in the future.