160 likes | 334 Views
Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consortium {mlglenn, strassel@ldc.upenn.edu} http://www.ldc.upenn.edu/Projects/NISTMeet. Scope of Work. Training data (Pre-publication) distribution Conference room test data Transcription
E N D
Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consortium{mlglenn, strassel@ldc.upenn.edu}http://www.ldc.upenn.edu/Projects/NISTMeet
Scope of Work • Training data • (Pre-publication) distribution • Conference room test data • Transcription • Careful • Quick • Comparison and analysis • Infrastructure • XTrans Toolkit • Features for meetings
RT-05S Training Datadistributed by LDC • (Pre-publication*) distribution via e-corpus to RT-05 participants • All available from www.ldc.upenn.edu/Catalog
RT-05S Evaluation Datatranscribed by LDC • Conference room data • Ten meeting sessions, 12minutes each • Contributed by five sites • Multiple recording conditions for each session • Primarily business meeting content • Transcribers report it was faster, easier and more interesting to transcribe than RT-04 meeting eval data • All data carefully transcribed (CTR) • Half of data quickly transcribed (QTR) • for contrastive study
CTR Process • Using IHM channels • One exception – participant on speakerphone • 1st pass: manual segmentation • Turns breath groups • 3-8 seconds per segment, designed for ease of transcription only • ~10 ms padding around each segment boundary • No segmentation or transcription of isolated speaker noise • 2nd pass: initial verbatim transcription • No time limit • Goal is to “get everything right” • 3rd pass: verify existing transcription and timestamps, add additional markup • Indicate proper names, filled pauses, noise, etc. • Revisit difficult sections
CTR Quality Control • Additional QC pass by lead transcriber • Using mixed IHM recordings and/or SDM • Merge individual transcripts • Speaker assignment • Transcription accuracy, completeness • Markup consistency • Spell check • Syntax (format) check • Check consistency and accuracy of names, acronyms, terminology • Check silence (untranscribed) regions for missed speech using customized tool
QTR Process • 0th pass: automatic audio segmentation • Pause detection algorithm • No manual correction • 1st pass: verbatim transcription • Limited to five times real time • Goal is to “get the words right” only • No special markup, orthography or capitalization • No extra time spent on difficult sections (e.g., disfluencies) • QC pass: minimal, semi-automated • Spell check • Format check • No check of transcript content, consistency of names/terms, etc.
Unique Challenges • Many speakers = takes longer to transcribe! • Impact of overlapping speech, even using IHM audio • Varying levels of speaker participation • Often no speech but other speaker/background noise • Meeting content • All over the map, from games to technical meetings • Lack of customized transcription tools • Existing tools optimized for • 1-channel, multispeaker per channel (BN) • 2-channel, one speaker per channel (CTS) • Needed: a tool that merges features of each • Arbitrary number of channels, speakers • Easily move between mixed and individual signal playback • Access to video would also help disambiguate
XTrans • Multipurpose speech annotation tool • Multilingual, multi-platform • written in Python • AGTK infrastructure • Customized task modules • Careful transcription • Specialized QC functions • Quick transcription • Timed mode • “Metadata” annotation • Structural features, speaker diarization • Correction mode • e.g., correct automatic transcript or QTR CTR • Comparison and adjudication of multiple transcripts • Allows video input