1 / 16

Scope of Work

Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consortium {mlglenn, strassel@ldc.upenn.edu} http://www.ldc.upenn.edu/Projects/NISTMeet. Scope of Work. Training data (Pre-publication) distribution Conference room test data Transcription

edena
Download Presentation

Scope of Work

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consortium{mlglenn, strassel@ldc.upenn.edu}http://www.ldc.upenn.edu/Projects/NISTMeet

  2. Scope of Work • Training data • (Pre-publication) distribution • Conference room test data • Transcription • Careful • Quick • Comparison and analysis • Infrastructure • XTrans Toolkit • Features for meetings

  3. RT-05S Training Datadistributed by LDC • (Pre-publication*) distribution via e-corpus to RT-05 participants • All available from www.ldc.upenn.edu/Catalog

  4. RT-05S Evaluation Datatranscribed by LDC • Conference room data • Ten meeting sessions, 12minutes each • Contributed by five sites • Multiple recording conditions for each session • Primarily business meeting content • Transcribers report it was faster, easier and more interesting to transcribe than RT-04 meeting eval data • All data carefully transcribed (CTR) • Half of data quickly transcribed (QTR) • for contrastive study

  5. CTR Process • Using IHM channels • One exception – participant on speakerphone • 1st pass: manual segmentation • Turns  breath groups • 3-8 seconds per segment, designed for ease of transcription only • ~10 ms padding around each segment boundary • No segmentation or transcription of isolated speaker noise • 2nd pass: initial verbatim transcription • No time limit • Goal is to “get everything right” • 3rd pass: verify existing transcription and timestamps, add additional markup • Indicate proper names, filled pauses, noise, etc. • Revisit difficult sections

  6. CTR Quality Control • Additional QC pass by lead transcriber • Using mixed IHM recordings and/or SDM • Merge individual transcripts • Speaker assignment • Transcription accuracy, completeness • Markup consistency • Spell check • Syntax (format) check • Check consistency and accuracy of names, acronyms, terminology • Check silence (untranscribed) regions for missed speech using customized tool

  7. QTR Process • 0th pass: automatic audio segmentation • Pause detection algorithm • No manual correction • 1st pass: verbatim transcription • Limited to five times real time • Goal is to “get the words right” only • No special markup, orthography or capitalization • No extra time spent on difficult sections (e.g., disfluencies) • QC pass: minimal, semi-automated • Spell check • Format check • No check of transcript content, consistency of names/terms, etc.

  8. CTR vs. QTR

  9. Example

  10. Unique Challenges • Many speakers = takes longer to transcribe! • Impact of overlapping speech, even using IHM audio • Varying levels of speaker participation • Often no speech but other speaker/background noise • Meeting content • All over the map, from games to technical meetings • Lack of customized transcription tools • Existing tools optimized for • 1-channel, multispeaker per channel (BN) • 2-channel, one speaker per channel (CTS) • Needed: a tool that merges features of each • Arbitrary number of channels, speakers • Easily move between mixed and individual signal playback • Access to video would also help disambiguate

  11. XTrans • Multipurpose speech annotation tool • Multilingual, multi-platform • written in Python • AGTK infrastructure • Customized task modules • Careful transcription • Specialized QC functions • Quick transcription • Timed mode • “Metadata” annotation • Structural features, speaker diarization • Correction mode • e.g., correct automatic transcript or QTR  CTR • Comparison and adjudication of multiple transcripts • Allows video input

  12. One Channel View

  13. One Speaker, MultiChannel

  14. MultiSpeaker, MultiChannel

  15. PanelView

  16. Adjudication Mode

More Related