Audio to phone transcription – UROP Spring Symposium 2021

Audio to phone transcription

Jingze Li


Pronouns: He/him

Research Mentor(s): San Duanmu, Professor
Research Mentor School/College/Department: Linguistics, College of Literature, Science, and the Arts
Presentation Date: Thursday, April 22, 2021
Session: Session 4 (2pm-2:50pm)
Breakout Room: Room 15
Presenter: 7

Event Link


Our goal of this project is to find a method to generate boundaries for each sound (consonant or vowel) in a piece of speech, and label it with the International Phonetic Alphabet (IPA). There are commercially-available tools to convert audio pieces to words for popularly used languages, such as English and Spanish. There is no such tool for resource-poor languages, many of which do not even have an orthography. An urgent task now is to document endangered languages before they are lost forever, which will benefit people who speak them and researchers. The most time consuming and most difficult task in documenting a language is to convert its recording to transcription. In the current practice, the transcription of an unanalyzed language is still done by hand, and each hour of recording would require 100 hours of an experienced linguist to transcribe. We believe automatic transcription will save time and free our linguists from laborious work. Throughout the year we had worked on identifying acoustic cues, practiced manual transcription, and understanding signals for different features. Currently, I am working on a Japanese audio file and I was able to identify boundaries (about 50% accuracy) for major classes of sounds. If time allowed, we wish to distinguish between different consonants, nasals and vowels and identify the most effective cues we can use (i.e. intensity and sharpness of sounds).

Authors: Jingze Li, San Duanmu
Research Method: Qualitative Study

lsa logoum logo