1 / 31

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop. July 13, 2005 Royal College of Physicians Edinburgh, UK. Meeting Venue Cleared. 18:00. Today’s Agenda. Updated: July 5, 2005. Administrative Points. Participants:

yetter
Download Presentation

Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to the Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop July 13, 2005 Royal College of Physicians Edinburgh, UK

  2. Meeting Venue Cleared 18:00 Today’sAgenda Updated: July 5, 2005

  3. Administrative Points • Participants: • Pick up the hard copy proceedings on the front desk • Presenters: • The agenda will be strictly followed • Time slots include Q&A time. • Presenters should either • Load their presentations on the computer at the front, or • Test their laptops during the breaks prior to making their presentation • We’d like to thank: • MLMI-05 organizing committee for hosting this workshop • Caroline Hastings for the workshop’s administration • All the volunteers: evaluation participants, data providers, transcribers, annotators, paper authors, presenters and other contributors

  4. The Rich Transcription 2005 Spring Meeting Recognition Evaluation http://www.nist.gov/speech/tests/rt/rt2005/spring/ Jonathan Fiscus, Nicolas Radde, John Garofolo, Audrey Le, Jerome Ajot, Christophe Laprun July 13, 2005 Rich Transcription 2004 Spring Meeting Recognition Workshop at MLMI 2005

  5. Overview • Rich Transcription Evaluation Series • Research opportunities in the Meeting Domain • RT-05S Evaluation • Audio input conditions • Corpora • Evaluation tasks and results • Conclusion/Future

  6. The Rich Transcription Task Multiple Applications RICH TRANSCRIPTION Speech-To-Text + METADATA Readable Transcripts Human-to-Human Speech Component Recognition Technologies Smart Meeting Rooms Translation Extraction Retrieval Summarization

  7. Rich Transcription Evaluation Series • Goal: • Develop recognition technologies that produce transcripts which are understandable by humans and useful for downstream processes. • Domains: • Broadcast News (BN) • Conversational Telephone Speech (CTS) • Meeting Room speech • Parameterized “Black Box” evaluations • Evaluations control input conditions to investigate weaknesses/strengths • Sub-test scoring provides finer-grained diagnostics

  8. Research Opportunities in the Meeting Domain • Provide fertile environment to advance state-of-the-art in technologies for understanding human interaction • Many potential applications • Meeting archives, interactive meeting rooms, remote collaborative systems • Important Human Language Technology challenges not posed by other domains • Varied forums and vocabularies • Highly interactive and overlapping spontaneous speech • Far field speech effects • Ambient noise • Reverberation • Participant movement • Varied room configurations • Many microphone conditions • Many camera views • Multimedia information integration • Person, face, and head detection/tracking

  9. RT-05S Evaluation Tasks • Focus on core speech technologies • Speech-to-Text Transcription • Diarization “Who Spoke When” • Diarization “Speech Activity Detection” • Diarization “Source Localization”

  10. Five System Input Conditions • Distant microphone conditions • Multiple Distant Microphones (MDM) • Three or more centrally located table mics • Multiple Source Localization Arrays (MSLA) • Inverted “T” topology, 4-channel digital microphone array • Multiple Mark III digital microphone Arrays (MM3A) • Linear topology, 64-channel digital microphone array • Contrastive microphone conditions • Single Distant Microphone (SDM) • Center-most MDM microphone • Gauge performance benefit using multiple table mics • Individual Head Microphones (IHM) • Performance on clean speech • Similar to Conversational Telephone Speech • One speaker per channel, conversational speech

  11. Training/Development Corpora • Corpora provided at no cost to participants • ICSI Meeting Corpus • ISL Meeting Corpus • NIST Meeting Pilot Corpus • Rich Transcription 2004 Spring (RT-04S) Development & Evaluation Data • Topic Detection and Tracking Phase 4 (TDT4) corpus • Fisher English conversational telephone speech corpus • CHIL development test set • AMI development test set and training set • Thanks to ELDA and LDC for making this possible

  12. RT-05S Evaluation Test Corpora:Conference Room Test Set • Goal-oriented small conference room meetings • Group meetings and decision-making exercises • Meetings involved 4-10 participants • 120 minutes – Ten excerpts, each twelve minutes in duration • Five sites donated two meetings each: • Augmented Multiparty Interaction (AMI) Program, International Computer Science Institute (ICSI), NIST, and Virginia Tech (VT) • No VT data was available for system development • Similar test set construction used for RT-04S evaluation • Microphones: • Participants wore head microphones • Microphones were placed on the table among participants • AMI meetings included an 8-channel circular microphone array on the table • NIST meetings include 3 Mark III digital microphone arrays

  13. RT-05S Evaluation Test Corpora: Lecture Room Test Set • Technical lectures in small meeting rooms • Educational events where a single lecturer is briefing an audience on a particular topic • Meetings excerpts involve one lecturer and up to five participating audience members • 150 minutes – 29 excerpts from 16 lectures • Two types of excerpts selected by CMU • Lecturer excerpts – 89 minutes, 17 excerpts • Question & Answer (Q&A) excerpts – 61 minutes, 12 excerpts • All data collected at Karlsruhe University • Sensors: • Lecturer and at most two other participants wore head microphones • Microphones were placed on the table among participants • A source localization array mounted on each of the room’s four walls • Mark III mounted on the wall opposite the lecturer

  14. RT-05S Evaluation Participants

  15. Diarization “Who Spoke When” (SPKR) Task • Task definition • Identify the number of participants in each meeting and create a list of speech time intervals for each such participant • Several input conditions: • Primary: MDM • Contrast: SDM, MSLA • Four participating sites: ICSI/SRI, ELISA, MQU, TNO

  16. SPKR System Evaluation Method • Primary Metric • Diarization Error Rate (DER) – the ratio of incorrectly detected speaker time to total speaker time • System output speaker segment sets are mapped to reference speaker segment sets so as to minimize the total error • Errors consist of: • Speaker assignment errors (i.e., detected speech but not assigned to the right speaker) • False alarm detections • Missed detections • Systems were scored using the mdevaltool • Forgiveness collar of +/- 250ms around reference segment boundaries • DER on non-overlapping speech is the primary metric

  17. RT-05S SPKR ResultsPrimary Systems, Non-Overlapping Speech • Conference room SDM DER less than MDM • Sign test indicates differences are not significant • Primary ICSI/SRI Lecture Room system attributed the entire duration of each test excerpt to be from a single speaker. • ICSI/SRI contrastive system had a lower DER

  18. Lecture Room Results:Broken Down by Excerpt Type • Lecturer excerpt DERs are lower than Q&A excerpt DERs

  19. Historical Best System SPKR Performance on Conference Data • 20% relative reduction for MDM • 43% relative reduction for SDM

  20. Diarization “Speech Activity Detection” (SAD) Task • Task definition • create a list of speech time intervals where at least one person is talking • Dry run evaluation for RT-05S • Proposed by CHIL • Several input conditions: • Primary: MDM • Contrast: SDM, MSLA, IHM • Systems designed for the IHM condition must detect speech and also reject cross talk speech and breath noises, therefore IHM systems are not directly comparable to MDM or SDM systems • Three participating sites: ELISA, Purdue, TNO

  21. SAD System Evaluation Method • Primary metric • Diarization Error Rate (DER) • Same formula and software as used for the SPKR task • Reduced to a two-class problem: speech vs. non-speech • No speaker assignment errors, just false alarms and missed detections • Forgiveness collar of +/- 250ms around reference segment boundaries

  22. RT-05S SAD ResultsPrimary Systems • DERs for conference and lecture room MDM data are similar • Purdue didn’t compensate for breath noise and crosstalk

  23. Speech-To-Text (STT) Task • Task definition • Systems output a single stream of time-tagged word tokens • Several input conditions: • Primary: MDM • Contrast: SDM, MSLA, IHM • Two participating sites: AMI and ICSI/SRI

  24. STT System Evaluation Method • Primary metric • Word Error Rate (WER) - ratio of inserted, deleted, and substituted words to the total number of words in the reference • System and reference words are normalized to a common form • System words are mapped to reference words using a word-mediated dynamic programming string alignment program • Systems were scored using the NIST Scoring Toolkit (SCTK) version 2.1 • A Spring 2005 update to the SCTK alignment tool can now score most of the overlapping speech in the distant microphone test material • Can now handle up to 5 simultaneous speakers • 98% of Conference Room test can be scored • 100% of Lecture Room test set can be scored • Greatly improved over Spring 2004 prototype

  25. RT-05S STT ResultsPrimary Systems (Incl. overlaps) • First evaluation for the AMI team • IHM error rates for conference and lecture room data are comparable • ICSI/SRI lecture room MSLA WER lower than MDM/SDM WER Lecture Room Conference Room Microphone conditions

  26. Historical STT Performance in the Meeting Domain • Performance for ICSI/SRI has dramatically improved for all conditions

  27. Diarization “Source Localization” (SLOC) Task • Task definition • Systems track the three-dimensional position of the lecturer (using audio input only) • Constrained to lecturer subset of the Lecture Room test set • Evaluation protocol and metrics defined in the CHIL “Speaker Localization and Tracking – Evaluation Criteria” document • Dry run pilot evaluation for RT-05S • Proposed by CHIL • CHIL provided the scoring software and annotated the evaluation data • One evaluation condition • Multiple source localization arrays • Required calibration of source localization microphone positions and video cameras • Three participating sites: ITC-irst, KU, TNO

  28. SLOC System Evaluation Method • Primary Metric: • Root Mean Squared Error (RMSE) – a measure of the average Euclidean distance between the reference speaker position and the system-determined speaker position • Measured in millimeters at 667 ms intervals • IRST SLOC scoring software • Maurizio Omologo will give further details this afternoon

  29. R-05S SLOC ResultsPrimary Systems • Issues: • What accuracy and resolution is needed for successful beamforming? • What will performance be for multiple speakers?

  30. Summary • Nine sites participated in the RT-05S evaluation • Up from six in RT-04S • Four evaluation tasks were supported across two meeting sub-domains: • Two experimental tasks: SAD and SLOC successfully completed • Dramatically lower STT and SPKR error rates for RT-05S

  31. Issues for RT-06 Meeting Eval • Domain • Sub domains • Tasks • Require at least three sites per task • Agreed-upon primary condition for each task • Data contributions • Source data and annotations • Participation intent • Participation commitment • Decision making process • Only sites with intent to participate will have input to the task definition

More Related