Corpora and AI Tools For Low-Resource Languages

Our team’s goal is to expand resources for linguistic corpora, especially for low-resource languages and varieties. This includes the collection of spoken or textual linguistic data from under-represented varieties, the use of AI to facilitate data transcription and annotation, the creation of accessible resources and tools for using these corpora, and the linguistic analysis of these varieties to better understand the structure and function of language. Team members will be involved in every step of this process to create more representative corpora for knowledge generation and public use.

This team is recruiting until June 1, 2026.

Team Leads: Chad Howe (Linguistics), Tianming Liu (Computer Science), Jon Forrest (Linguistics), Ryan Ka Yau Lai (Linguistics), Keith Langston (Linguistics