Skip to main navigation Skip to search Skip to main content

Corpus phonetics and speech technology infrastructure

Project: Research

Project Details

Description

Speech technology, including artificial intelligence (AI) trained on speech data, performs poorly in cases where little or no recorded audio data exists to train the required AI models. Building better speech technology in these cases requires creating collections of speech materials and their transcriptions. However, transcription is immensely time-consuming without the assistance of existing AI technologies. This project builds a high-quality speech data set to enable phonetics and phonology research for several low-data languages, and to model an approach to ease the “transcription bottleneck” assisted by techniques in AI and natural language processing (NLP). The project jointly engages the expert perspectives of users of target languages, linguists, and computer scientists, and establishes an infrastructure for collaborative, computationally mediated language work. Other benefits to society include bridging laboratory-style research and real-world applications and providing innovative educational opportunities for trainees. This project builds a 60-hour corpus of naturalistic and read speech data recorded in the field, suitable for both AI/NLP applications and research in acoustic phonetics and phonology. Unsupervised or weakly supervised machine learning techniques are used to semi-automatically transcribe and annotate a portion of the speech corpus. This transcription and annotation process uses a novel human-in-the-loop approach making direct use of expert speaker inputs: transcripts produced for recorded audio by pretrained language models are corrected by trained language experts. These adjusted annotations are incorporated into subsequent rounds of model training and fine-tuning to further increase the accuracy of outputs. The target languages exhibit several unusual phonetic and phonological features that form the basis for exploratory phonetic and phonological research, such as complex lexical tone, stem-initial prominence with unclear acoustic correlates, vowels with consonant-like constriction features, and variable external sandhi processes. The speech corpus, annotations, and language models are available as a starting point for linguistics, NLP, and AI work on related languages with translational impact. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date09/1/2508/31/27

Funding

  • National Science Foundation: $101,674.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.