Transcription methods for consistency, volume and efficiency

Transcription methods for consistency, volume and efficiency Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li

Outline • Introduction • Manual transcription overview • Approaches, genres, languages • Inter-transcriber consistency analysis • Results • Discussion • Conclusions and future work

Introduction • Linguistic Data Consortium supports language-related education, research and technology development • Programs that call for manual transcription include DARPA GALE, Phanotics, and NIST LRE, SRE, and RT • Transcription is a core component of many HLT research tasks, such as machine translation • Manual transcription efforts have been undertaken many languages, such as Chinese, English, Modern Standard Arabic, Arabic dialects, Pashto, Urdu, Farsi, Korean, Thai, Russian, and Spanish • LDC makes recommendations about transcription approaches for individual projects by balancing efficiency, cost, program needs to make such recommendations • Evaluation vs. training data? Genre? Timeline? • Current consistency study is informative for transcription teams • Could be used to establish baseline human performance for each task

Transcription spectrum overview • All manual transcripts share the same core elements • Time alignment at some level of granularity • Speaker identification • Transcript • Created in XTrans, LDC’s in-house transcription tool • Transcription methodologies target a range of data needs: • Small volumes (2 hours) vs. large volumes (thousands of hours) • Quick content transcript vs. meticulous transcription of all speaker utterances or noises • Auto time alignment vs. manual word- or phoneme-level alignment

Quick transcription

QTR example

Quick-rich transcription

QRTR example

Careful transcription

CTR example

Comparison of methods

Transcription consistency studies • Design overview • Pair-wise comparison of independent dual transcripts • Identical time-alignment • Scored with NIST’s SCLITE toolkit (Fiscus, 2006) • For English subset, differences adjudicated with in-house adjudication tool • Background study: EARS RT03 • English broadcast news (BN) and conversational telephone speech (CTS) • Careful transcription approach • All discrepancies adjudicated

Results of RT03 study • Error expressed in terms of “Word Disagreement Rate”, though derived from Word Error Rate • Not all transcription “errors” are truly mistakes, so “disagreement” is a more accurate term. (Strassel, 2004)

Overview of current study

Current consistency study • Basic assumptions • Same as previous • For the purposes of the current study, LDC ignored stylistic differences such as capitalization or punctuation • Subset of English transcripts further analyzed using LDC’s in-house adjudication tool. • Approximately • 65% of the differences across all the English quick transcripts were labeled insignificant differences • 20% were labeled judgment calls • 15% were labeled transcriber errors

Results of 2010 study

Genre effect • Spontaneous, conversational genres are worse overall than broadcast news • BN is often very scripted • Limited overlapping speech • Unplanned speech is hard! • Overlapping speech or cross-talk • Background noise • Fast speech • More regions of disfluency • Dialect or accent • Meeting, interview and telephone domains also add • Jargon about specific topics • Challenging acoustic conditions

Conversational style

Disfluencies • Regions of disfluency are by far the most prevalent contributors to transcriber disagreement • Hesitation sounds • Stammering • Restarts

Dialect • Arabic transcription typically targets Modern Standard Arabic (MSA) • Dialect poses a particular challenge in transcribing Arabic conversations • Real data contains significant volumes of dialectal Arabic, especially in the broadcast conversation domain • Transcribers may differ in their rendering of non-MSA regions • In the following examples, Non-MSA regions are underlined. • Discrepancies are highlighted

Conclusions and future work • Cross-language, cross-genre inter-annotator analysis showed agreement in the 90-100% range • Transcripts for planned speech are generally more consistent than those for spontaneous speech • Careful transcription methods result in higher rates of agreement than quick transcription methods • Agreement is strongly conditioned by genre and language • When choosing a transcription approach for a project, LDC must balance efficiency, consistency, and cost • More investigation into what’s going on with Chinese is necessary • Future study on larger datasets • Examination of selection sample • The data discussed here ultimately will be compiled into one corpus and distributed to via the LDC catalog

Acknowledgments • Many thanks to the LDC transcription team for their hard work and analysis for the 2010 consistency study • Thanks to Jonathan Fiscus for his guidance in running SCLITE • This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this paper does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Transcription methods for consistency, volume and efficiency

Transcription methods for consistency, volume and efficiency

Presentation Transcript

Consistency and Replication

Finite volume methods for compressible MHD

transcription for webinar

Replication and Consistency

Checking Terminology Consistency with Statistical Methods

SAS Efficiency Techniques and Methods

Comparing Methods for Identifying Transcription Factor Target Genes

Applying Volume Methods: Coconut and Pineapple

Contraceptive Methods Volume I

Consistency of Concepts and Applied Methods in Business Statistics

Comparison of Volume Methods

Tools and Methods for Assessing the Efficiency Of Aid Interventions

Consistency Methods for Temporal Reasoning

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Assessment of Methods for Block Volume Measurement

Consistency and Replication

Checking Terminology Consistency with Statistical Methods

Time-Consistency and Environmental Efficiency of Closed International Agreements (IEA)

Answers for Transcription

Consistency Methods for Temporal Reasoning

Assessment of Methods for Block Volume Measurement