Scope of Work

Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consortium{mlglenn, strassel@ldc.upenn.edu}http://www.ldc.upenn.edu/Projects/NISTMeet

Scope of Work • Training data • (Pre-publication) distribution • Conference room test data • Transcription • Careful • Quick • Comparison and analysis • Infrastructure • XTrans Toolkit • Features for meetings

RT-05S Training Datadistributed by LDC • (Pre-publication*) distribution via e-corpus to RT-05 participants • All available from www.ldc.upenn.edu/Catalog

RT-05S Evaluation Datatranscribed by LDC • Conference room data • Ten meeting sessions, 12minutes each • Contributed by five sites • Multiple recording conditions for each session • Primarily business meeting content • Transcribers report it was faster, easier and more interesting to transcribe than RT-04 meeting eval data • All data carefully transcribed (CTR) • Half of data quickly transcribed (QTR) • for contrastive study

CTR Process • Using IHM channels • One exception – participant on speakerphone • 1st pass: manual segmentation • Turns  breath groups • 3-8 seconds per segment, designed for ease of transcription only • ~10 ms padding around each segment boundary • No segmentation or transcription of isolated speaker noise • 2nd pass: initial verbatim transcription • No time limit • Goal is to “get everything right” • 3rd pass: verify existing transcription and timestamps, add additional markup • Indicate proper names, filled pauses, noise, etc. • Revisit difficult sections

CTR Quality Control • Additional QC pass by lead transcriber • Using mixed IHM recordings and/or SDM • Merge individual transcripts • Speaker assignment • Transcription accuracy, completeness • Markup consistency • Spell check • Syntax (format) check • Check consistency and accuracy of names, acronyms, terminology • Check silence (untranscribed) regions for missed speech using customized tool

QTR Process • 0th pass: automatic audio segmentation • Pause detection algorithm • No manual correction • 1st pass: verbatim transcription • Limited to five times real time • Goal is to “get the words right” only • No special markup, orthography or capitalization • No extra time spent on difficult sections (e.g., disfluencies) • QC pass: minimal, semi-automated • Spell check • Format check • No check of transcript content, consistency of names/terms, etc.

CTR vs. QTR

Example

Unique Challenges • Many speakers = takes longer to transcribe! • Impact of overlapping speech, even using IHM audio • Varying levels of speaker participation • Often no speech but other speaker/background noise • Meeting content • All over the map, from games to technical meetings • Lack of customized transcription tools • Existing tools optimized for • 1-channel, multispeaker per channel (BN) • 2-channel, one speaker per channel (CTS) • Needed: a tool that merges features of each • Arbitrary number of channels, speakers • Easily move between mixed and individual signal playback • Access to video would also help disambiguate

XTrans • Multipurpose speech annotation tool • Multilingual, multi-platform • written in Python • AGTK infrastructure • Customized task modules • Careful transcription • Specialized QC functions • Quick transcription • Timed mode • “Metadata” annotation • Structural features, speaker diarization • Correction mode • e.g., correct automatic transcript or QTR  CTR • Comparison and adjudication of multiple transcripts • Allows video input

One Channel View

One Speaker, MultiChannel

MultiSpeaker, MultiChannel

PanelView

Adjudication Mode

Scope of Work

Scope of Work

Presentation Transcript

Scope of Work and Planning

8th Scope of Work Overview

PMC - SCOPE OF WORK

RTTT Scope of Work

LED Lighting – Scope of Work

SCOPE OF WORK

SCOPE OF WORK

Scope of Work

Race to the Top Scope of Work

CDNI Work Scope Recap

Scope of Work

SCOPE OF WORK

LPFS 2009 Scope of Work

Scope of the Work

Scope of Work Item

Scope of Work Item

Scope of Work Item

Scope of the Work

Scope of Work Education

PMC - SCOPE OF WORK

Scope of Work

Developing a Scope of Work