Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009

From Recognition To Understanding Expanding traditional scope of signal processing Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009

Outline • Traditional scope of signal processing: “signal” dimension and “processing/task” dimension • Expansion along both dimensions • “signal” dimension • “task” dimension • Case study on the “task” dimension • From speech recognition to speech understanding • Three benefits for MMSP research

Signal Processing Constitution • “… The Field of Interest of the Society shall be the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals by digital or analog devices or techniques. The term ‘signal’ includes audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and other signals…” (ARTICLE II) • Translate to a “matrix”: “Processing type” (row) vs. “Signal type” (column)

Scope of SP in a matrix 4

Scope of SP in a matrix (expanded) 5

Speech Understanding: Case Study(Yaman, Deng, Yu, Acero: IEEE Trans ASLP, 2008) • Speech understanding: not to get “words” but to get “meaning/semantics” (actionable by the system) • Speech utterance classification as a simple form of speech “understanding” • Case study: ATIS domain (Airline Travel Info System) • “Understanding”: want to book a flight? or get info about ground transportation in SEA?

Automatic Speech Recognizer Semantic Classifier Acoustic Model Language Model Classifier Model Feature Functions Traditional Approach to Speech Understanding/Classification Find the most likely semantic class for the rth acoustic signal 1st Stage: Speech recognition 2nd Stage: Semantic classification

Traditional/New Approach • Word error rate minimized in the 1st stage, • Understanding error rate minimized in the 2nd stage. • Lower word errors do not necessarily mean better understanding. • The new approach: integrate the two stages so that the overall “understanding” errors are minimized.

N-best List Automatic Speech Recognizer N-best List Rescoring using Semantic Classifier & LM Training Acoustic Model Language Model Classifier Model Feature Functions New Approach: Integrated Design Key Components: • Discriminative Training • N-best List Rescoring • Iterative Update of Parameters

Classification Decision Rule using N-Best List Approximating the classification decision rule sum over all possible W maximize over W in the N-best list Integrative Score

An Illustrative Example best score, but wrong class best sentence to yield the correct class, but low score

Minimizing the Misclassifications The misclassification function: The loss function associated with the misclassification function: Minimize the misclassifications:

Discriminative Training of Language Model Parameters Find the language model probabilities to minimize the total classification loss weighting factor Count of the bigram in the word string of the correct class Count of the bigram in the word string of the nth competitive class

Discriminative Training of Semantic Classifier Parameters Find the classifier model parameters to minimize the total classification loss weighting factor

Setup for the Experiments • ATIS II+III data is used: • 5798 training wave files • 914 test wave files • 410 development wave files (used for parameter tuning & stopping criteria) • Microsoft SAPI 6.1 speech recognizer is used. • MCE classifiers are built on top of max-entropy classifiers.

Experiments: Baseline System Performance • ASR transcription: • One-best matching sentence, W. • Classifier Training: • Max-entropy classifiers using one-best ASR transcription. • Classifier Testing: • Max-entropy classifiers using one-best ASR transcription.

Experimental Results One iteration of training consists of: Discriminative LM Training Speech Utterance SAPI SR Max-Entropy Classifier Training Discriminative Classifier Training CER

From Recognition to Understanding • This case study illustrates that joint design of “recognition” and “understanding” components are beneficial • Drawn from speech research area • Speech translation has similar conclusion? • Case studies from image/video research areas? Image recognition/understanding?

Summary • The “matrix” view of signal processing • “signal type” as the column • “Task type” as the row • Benefit 1: Natural extension of the “row” elements (e.g., text/language) & of “column” (e.g., understanding) • Benefit 2: Cross-column breeding: e.g., Can speech/audio and image/video recognition researchers learn from each other in terms of machine learning & SP techniques (similarities & differences)? • Benefit 3: Cross-row breeding: e.g., Given the trend from speech recognition to understanding (& the kind of approach in the case study), what can we say about image/video and other media understanding?

Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009

Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009

Presentation Transcript

Auditor Workshop July 30, 2009

Microsoft Research Faculty Summit 2009

Scott Counts Microsoft Research Redmond, WA

By: David McQuilling; Jesus Caban Deng Li

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

Food Preservation Workshop July 11, 2009

Ethan Jackson Research in Software Engineering (RiSE), Microsoft Research - Redmond

Qualitative Health Research Presented at the

Nikolai Tillmann Foundations of Software Engineering Microsoft Research Redmond WA, USA

Li Deng Microsoft Research, Redmond

Flat Datacenter Storage Microsoft Research, Redmond

Li Deng Microsoft Research, Redmond, USA Tianjin University, July 4, 2013 (Day 3)

Internship at Microsoft Research?

Roberto Togneri University of Western Australia Li Deng Microsoft Research, Redmond

Presented at the 2009 National Forum on Education Policy Nashville, Tennessee July 10, 2009

Redmond, WA | March 8-9, 2012 Microsoft / pilhighered

Li Deng Microsoft Research, Redmond, USA Tianjin University, July 2-5, 2013

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

Issam Bazzi, Alex Acero, and Li Deng Microsoft Research One Microsoft Way Redmond, WA, USA 2003

Find Your New Home at Trailside in Redmond, WA

Custom Clothes in Redmond wa

Seismic Arrays presented at the WORKSHOP