• URL: .../publications/courses/ece_8423/lectures/current/lecture_28

LECTURE 28: STATE OF THE ART • Objectives:Language Modeling in ASRDiscriminative Feature MappingExample SystemCourse Evaluations • Resources:MB: Unsupervised LM AdaptationRS: Statistical Language ModelingDP: Discriminatively Trained FeaturesAS: Discriminative AdaptationIBM: GALE Mandarin • URL: .../publications/courses/ece_8423/lectures/current/lecture_28.ppt • MP3: .../publications/courses/ece_8423/lectures/current/lecture_28.mp3

Statistical Approach To Speech Recognition

Speech Recognition Architectures • Core components: • transduction • feature extraction • acoustic modeling (hidden Markov models) • language modeling (statistical N-grams) • search (Viterbi beam) • knowledge sources Our focus will be on the acoustic modeling components of the system.

Statistical Language Modeling: N-Gram Models • The probability of a word sequence, , can be decomposed as: • Clearly, estimating this for every unique word history is prohibitive. A practical approach is to assume this probability depends only on an equivalence class: • There are three common simplifications, known as N-grams, we can make: • Of course, there are many ways to merge histories, such as based on linguistic context (e.g., parts of speech such as article, noun), and we can use higher-order N-grams.

N-Gram Models Require Adaptation

MAP Language Model (LM) Adaptation • The LM adaptation problem is often described as an interpolation problem between an existing LM and an LM estimated from new data. • Any of the approaches we have previously discussed can be employed. MAP adaptation can be shown to simplify to: • If additional assumptions about the priors for the histories are made, this simplifies further to: • Most of the adaptation methods we have discussed previously can be applied to this problem because a language model at its core is just a likelihood model. • However, language models must also deal with the problem of unseen events, and hence models must be smoothed to account for sparseness of data.

Discriminatively-Trained Features • Features can also be adapted using a similar transformational approach that we used for Gaussian means: • where ht represents a transformation high-dimensional features and M represents a dimensionality reduction transformation. This approach combines the large-margin classification approaches (e.g., support vector machines) with traditional GMM approaches. • The transformation M is typically estimated using an MPE criterion, and hence this method is often called fMPE.

State of the Art Systems (IBM GALE)

Summary • Discussed adaptation of language models and showed the process is similar to that for feature vectors. • Discussed feature-space adaptation. • Reviewed a state of the art system that uses many forms of adaptation. • Course evaluations…

• URL: .../publications/courses/ece_8423/lectures/current/lecture_28

• URL: .../publications/courses/ece_8423/lectures/current/lecture_28

Presentation Transcript

Lecture 14: Laplace Transform Properties

LECTURE # 1

Particle Detectors (7)

Lecture #4: Modeling Embedded Systems

Lecture 11 Introduction to Laser Guide Stars

CISE301: Numerical Methods Topic 2: Solution of Nonlinear Equations Lectures 5-11:

Discrete Mathematics I Lectures Chapter 10

Management System For PG Education

Nautical Publications

Lectures on B Physics

Online Algorithms Lecture notes for lectures given by Dr. Ely Porat, Bar-Ilan University

Introduction to DNA Lecture notes edited by John Reif from PPT lectures by:

Discipline : “ Preventive Dentistry” Programme of lectures: – 7 lectures + 7 lectures

Econ 1000 Lecture 2 Mod2

Lecture 27: 46. Developing Managerial Competencies

Lectures on B-physics 19-20 April 2011 Vrije Universiteit Brussel

CS 103 Discrete Structures Lecture 01 Introduction Logic and Proofs (1)

Lectures on Nordic and European Consumer Law

Materials for Lecture 08

Relational Algebra and My SQL(II)

Lectures on Nordic and European Consumer Law