240 likes | 318 Views
Transcription methods for consistency, volume and efficiency. Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li. Outline. Introduction Manual transcription overview Approaches, genres, languages Inter-transcriber consistency analysis Results
E N D
Transcription methods for consistency, volume and efficiency Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li
Outline • Introduction • Manual transcription overview • Approaches, genres, languages • Inter-transcriber consistency analysis • Results • Discussion • Conclusions and future work
Introduction • Linguistic Data Consortium supports language-related education, research and technology development • Programs that call for manual transcription include DARPA GALE, Phanotics, and NIST LRE, SRE, and RT • Transcription is a core component of many HLT research tasks, such as machine translation • Manual transcription efforts have been undertaken many languages, such as Chinese, English, Modern Standard Arabic, Arabic dialects, Pashto, Urdu, Farsi, Korean, Thai, Russian, and Spanish • LDC makes recommendations about transcription approaches for individual projects by balancing efficiency, cost, program needs to make such recommendations • Evaluation vs. training data? Genre? Timeline? • Current consistency study is informative for transcription teams • Could be used to establish baseline human performance for each task
Transcription spectrum overview • All manual transcripts share the same core elements • Time alignment at some level of granularity • Speaker identification • Transcript • Created in XTrans, LDC’s in-house transcription tool • Transcription methodologies target a range of data needs: • Small volumes (2 hours) vs. large volumes (thousands of hours) • Quick content transcript vs. meticulous transcription of all speaker utterances or noises • Auto time alignment vs. manual word- or phoneme-level alignment
Transcription consistency studies • Design overview • Pair-wise comparison of independent dual transcripts • Identical time-alignment • Scored with NIST’s SCLITE toolkit (Fiscus, 2006) • For English subset, differences adjudicated with in-house adjudication tool • Background study: EARS RT03 • English broadcast news (BN) and conversational telephone speech (CTS) • Careful transcription approach • All discrepancies adjudicated
Results of RT03 study • Error expressed in terms of “Word Disagreement Rate”, though derived from Word Error Rate • Not all transcription “errors” are truly mistakes, so “disagreement” is a more accurate term. (Strassel, 2004)
Current consistency study • Basic assumptions • Same as previous • For the purposes of the current study, LDC ignored stylistic differences such as capitalization or punctuation • Subset of English transcripts further analyzed using LDC’s in-house adjudication tool. • Approximately • 65% of the differences across all the English quick transcripts were labeled insignificant differences • 20% were labeled judgment calls • 15% were labeled transcriber errors
Genre effect • Spontaneous, conversational genres are worse overall than broadcast news • BN is often very scripted • Limited overlapping speech • Unplanned speech is hard! • Overlapping speech or cross-talk • Background noise • Fast speech • More regions of disfluency • Dialect or accent • Meeting, interview and telephone domains also add • Jargon about specific topics • Challenging acoustic conditions
Disfluencies • Regions of disfluency are by far the most prevalent contributors to transcriber disagreement • Hesitation sounds • Stammering • Restarts
Dialect • Arabic transcription typically targets Modern Standard Arabic (MSA) • Dialect poses a particular challenge in transcribing Arabic conversations • Real data contains significant volumes of dialectal Arabic, especially in the broadcast conversation domain • Transcribers may differ in their rendering of non-MSA regions • In the following examples, Non-MSA regions are underlined. • Discrepancies are highlighted
Conclusions and future work • Cross-language, cross-genre inter-annotator analysis showed agreement in the 90-100% range • Transcripts for planned speech are generally more consistent than those for spontaneous speech • Careful transcription methods result in higher rates of agreement than quick transcription methods • Agreement is strongly conditioned by genre and language • When choosing a transcription approach for a project, LDC must balance efficiency, consistency, and cost • More investigation into what’s going on with Chinese is necessary • Future study on larger datasets • Examination of selection sample • The data discussed here ultimately will be compiled into one corpus and distributed to via the LDC catalog
Acknowledgments • Many thanks to the LDC transcription team for their hard work and analysis for the 2010 consistency study • Thanks to Jonathan Fiscus for his guidance in running SCLITE • This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this paper does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.