70 likes | 162 Views
Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond. ASR 2000 September 20, 2000 John Garofolo John.Garofolo@NIST.gov. Challenges. Target for the new millenium in ASR Technology: Meeting Room Transcription and Annotation Task multiple sensors
E N D
Twenty-First CenturyAutomatic Speech Recognition:Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo John.Garofolo@NIST.gov
Challenges • Target for the new millenium in ASR Technology: • Meeting Room Transcription and Annotation Task • multiple sensors • stationary, mobile, and arrays of mics in conjunction with video input devices • noise and microphone robustness • speaker-independent recognition • speaker identification • automatic production of usable transcriptions with speakers identified and with properly formatted, capitalized, and punctuated text. • Perfect research task to move forward the state-of-the-art • Development infrastructure will require • new metrics, evaluation tools • new I/O specifications • research corpora, new methods of collecting, compiling, and annotating data
NIST Proposed Initiative • Collaborate with ASR research community to create evaluation infrastructure • Develop corpus design and transcription and ASR system output specifications • Revise and update NIST SCLITE ASR scoring software to extend beyond classical word error rate measurements • Collaborate with NIST Smart Space Lab to collect, transcribe, and annotate a pilot meeting room transcription corpus • Sponsor Evaluations and Workshops
Meeting type: Possible focus group discussions requiring information lookup and real consensus building Participants: At least 4 per meeting plus moderator Native speakers? Multi-microphones: Head-mounted ‘control’ Microphone array Lapel mikes worn by, or desk-top mikes for each participant Table/wall-mounted stationary mikes Video: Wide-angle view positioned so that it can be correlated with mike array for source location. Possibly other views to capture faces head-on. Annotation: Transcription (words with capitalization/punctuation) Speaker ID Background noise conditions Some initial exploration of annotating dialogue, people movement, gestures, lip movement, interaction with devices Meeting Room Pilot Corpus
Large Screen Display Camera Elements Equipment Room Array Beams Microphone Array Camera Element NIST Smart SpaceTest Bed Laboratory • 59-mic array, assorted conventional mics • Cameras/video capture • Large screen display • Pervasive devices • Palm tops • Tablets • Wireless LAN • Data collection servers • Gigabit Ethernet • High-bandwidth data flow system • Well-suited for creating pilot meeting corpus
Approach for 2000 - 2001 • NIST will collaborate closely with a few research sites who will be the early users of the data to create the project specifications. • Via E-mail list and Web site • NIST will create a pilot meeting room data collection • Data storage will be a significant issue • NIST will create evaluation software for the new domain • Update SCLITE + detection-based scoring software • If feasible, NIST will coordinate an experimental evaluation • Late summer/early Fall 2001 • NIST will host a workshop (~October 2001) • to discuss research issues • to introduce the pilot corpus to the wider research community • to discuss evaluation metrics and the dry-run evaluation • to plan for future efforts (kickoff for larger DARPA program?)
21st Century Automatic Speech Recognition: Meeting Rooms and Beyond John Garofolo John.Garofolo@nist.gov NIST Speech Group: http://www.nist.gov/speech NIST Smart Space Lab: http://www.nist.gov/smartspace/ ASR 2000 September 20, 2000