100 likes | 222 Views
Automatic Speech Attribute Transcription (ASAT). Project Period: 10/01/04 – 9/30/08 The ASAT Team Mark Clements (clements@ece.gatech.edu) Sorin Dusan (sdusan@speech.rutgers.edu) Eric Fosler-Lussier (fosler@cse.ohio-state.edu) Keith Johnson (kjohnson@ling.ohio-state.edu)
E N D
Automatic Speech Attribute Transcription (ASAT) • Project Period: 10/01/04 – 9/30/08 • The ASAT Team • Mark Clements (clements@ece.gatech.edu) • Sorin Dusan (sdusan@speech.rutgers.edu) • Eric Fosler-Lussier (fosler@cse.ohio-state.edu) • Keith Johnson (kjohnson@ling.ohio-state.edu) • Fred Juang (juang@ece.gatech.edu) • Larry Rabiner (lrr@caip.rutgers.edu) • Chin Lee (Coordinator, chl@ece.gatech.edu) • NSF HLC Program Director: (mharper@nsf.gov)
ASAT Paradigm and SoW 1 2 3 4 5. Overall System Prototypes and Common Platform
Bank of Speech Attribute Detectors • Each detected attribute is represented by a time series (event) • An example: frame-based detector (0-1 simulating posterior probability) • ANN-based Attribute Detectors • An example: nasal and stop detectors • Sound-specific parameters and feature detectors • An example: “VOT” for V/UV stop discrimination • Biologically-motivated processors and detectors • Analog detectors, short-term and long-term detectors • Perceptually-motivated processors and detectors • Converting speech into neural activity level functions • Others?
An Example: More Visible than Spectrogram? j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn Stop XX Nasal Vowel Early acoustic to linguistic mapping !!
Event Merger • Merge multiple time series into another time series • Maintaining the same detector output characteristics • Combine temporal events • An example: combining phones into words (word detectors) • Combine spatial events • An example: combining vowel and nasal features into nasalized vowels • Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors • Others: OOV, partial recognition
Evidence Verifier • Provide confidence measures to events and evidences • Utterance verification algorithms can be used • Output recognized evidences (words and others) • Hypothesis testing is needed in every stage • Prune event and evidence lattices • Pruning threshold decisions • Minimum verification error (MVE) verifiers • Many new theories can be developed • Others?
Knowledge Sources: Definition & Evaluation • Explore large body of speech science literature • Define training, evaluation and testing databases • Develop Objective Evaluation Methodology • Defining detectors, mergers, verifiers, recognizers • Defining/collecting evaluation data for all • Document all pieces on the web
Prototype ASR Systems and Platform • Continuous Phone Recognition: TIMIT? • Continuous Speech Recognition • Connected digit recognition • Wall Street Journal • Switchboard? • Establishment of a collaborative platform • Implementing divide-’n’-conquer strategy • Developing a user community
Summary • ASAT Goal: Go beyond state-of-the-art • ASAT Spirit: Work for team excellence • ASAT team member responsibilities • MAC: Event Fusion • SD: Perception-based processing • EF: Knowledge Integration (Event Merger) • KJ: Acoustic Phonetics • BHJ: Evidence Verifier • LRR: Attribute Detector • CHL: Overall