1 / 24

Transcription methods for consistency, volume and efficiency

Transcription methods for consistency, volume and efficiency. Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li. Outline. Introduction Manual transcription overview Approaches, genres, languages Inter-transcriber consistency analysis Results

Download Presentation

Transcription methods for consistency, volume and efficiency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcription methods for consistency, volume and efficiency Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li

  2. Outline • Introduction • Manual transcription overview • Approaches, genres, languages • Inter-transcriber consistency analysis • Results • Discussion • Conclusions and future work

  3. Introduction • Linguistic Data Consortium supports language-related education, research and technology development • Programs that call for manual transcription include DARPA GALE, Phanotics, and NIST LRE, SRE, and RT • Transcription is a core component of many HLT research tasks, such as machine translation • Manual transcription efforts have been undertaken many languages, such as Chinese, English, Modern Standard Arabic, Arabic dialects, Pashto, Urdu, Farsi, Korean, Thai, Russian, and Spanish • LDC makes recommendations about transcription approaches for individual projects by balancing efficiency, cost, program needs to make such recommendations • Evaluation vs. training data? Genre? Timeline? • Current consistency study is informative for transcription teams • Could be used to establish baseline human performance for each task

  4. Transcription spectrum overview • All manual transcripts share the same core elements • Time alignment at some level of granularity • Speaker identification • Transcript • Created in XTrans, LDC’s in-house transcription tool • Transcription methodologies target a range of data needs: • Small volumes (2 hours) vs. large volumes (thousands of hours) • Quick content transcript vs. meticulous transcription of all speaker utterances or noises • Auto time alignment vs. manual word- or phoneme-level alignment

  5. Quick transcription

  6. QTR example

  7. Quick-rich transcription

  8. QRTR example

  9. Careful transcription

  10. CTR example

  11. Comparison of methods

  12. Transcription consistency studies • Design overview • Pair-wise comparison of independent dual transcripts • Identical time-alignment • Scored with NIST’s SCLITE toolkit (Fiscus, 2006) • For English subset, differences adjudicated with in-house adjudication tool • Background study: EARS RT03 • English broadcast news (BN) and conversational telephone speech (CTS) • Careful transcription approach • All discrepancies adjudicated

  13. Results of RT03 study • Error expressed in terms of “Word Disagreement Rate”, though derived from Word Error Rate • Not all transcription “errors” are truly mistakes, so “disagreement” is a more accurate term. (Strassel, 2004)

  14. Overview of current study

  15. Current consistency study • Basic assumptions • Same as previous • For the purposes of the current study, LDC ignored stylistic differences such as capitalization or punctuation • Subset of English transcripts further analyzed using LDC’s in-house adjudication tool. • Approximately • 65% of the differences across all the English quick transcripts were labeled insignificant differences • 20% were labeled judgment calls • 15% were labeled transcriber errors

  16. Results of 2010 study

  17. Results of 2010 study

  18. Genre effect • Spontaneous, conversational genres are worse overall than broadcast news • BN is often very scripted • Limited overlapping speech • Unplanned speech is hard! • Overlapping speech or cross-talk • Background noise • Fast speech • More regions of disfluency • Dialect or accent • Meeting, interview and telephone domains also add • Jargon about specific topics • Challenging acoustic conditions

  19. Conversational style

  20. Disfluencies • Regions of disfluency are by far the most prevalent contributors to transcriber disagreement • Hesitation sounds • Stammering • Restarts

  21. Dialect • Arabic transcription typically targets Modern Standard Arabic (MSA) • Dialect poses a particular challenge in transcribing Arabic conversations • Real data contains significant volumes of dialectal Arabic, especially in the broadcast conversation domain • Transcribers may differ in their rendering of non-MSA regions • In the following examples, Non-MSA regions are underlined. • Discrepancies are highlighted

  22. Conclusions and future work • Cross-language, cross-genre inter-annotator analysis showed agreement in the 90-100% range • Transcripts for planned speech are generally more consistent than those for spontaneous speech • Careful transcription methods result in higher rates of agreement than quick transcription methods • Agreement is strongly conditioned by genre and language • When choosing a transcription approach for a project, LDC must balance efficiency, consistency, and cost • More investigation into what’s going on with Chinese is necessary • Future study on larger datasets • Examination of selection sample • The data discussed here ultimately will be compiled into one corpus and distributed to via the LDC catalog

  23. Acknowledgments • Many thanks to the LDC transcription team for their hard work and analysis for the 2010 consistency study • Thanks to Jonathan Fiscus for his guidance in running SCLITE • This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this paper does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

More Related