290 likes | 304 Views
Explore the application of SVMs to speech recognition, overcoming higher-dimensional problems for perfect classification. The study reviews SVM approach, experimental results, and a hybrid SVM/HMM system. Discusses issues and solutions in SVM-based speech recognition models.
E N D
Applications of Support Vector Machines to Speech Recognition Advisor :Dr. Hsu Graduate: Chun Kai Chen Author: Aravind Ganapathiraju, Jonathan E. Hamaker and Joseph Picone IEEE 2004
Outline • Motivation • Objective • Introduction • Speech Recognition • Support Vector Machines • Experimental Results • Conclusions • Personal Opinion
Motivation • There are problems with an ML formulation for applications such as speech recognition. • Higher dimensional problem will never achieve perfect classification.
Objective • Apply SVMs to overcome higher dimensionalproblems and achieve perfect classification. • Application of SVMs to large vocabulary speech recognition • To the development and optimization of an SVM/HMM hybrid system
Introduction • Speech Recognition • Speech Recognition Process • Hidden Markov Model • Application of SVMs to Speech Recognition: • Review the SVM approach • Discuss applications to speech recognition • Present experimental results
Speech Recognition Process (MFCC)
Hidden Markov Model (1/2) • A HMM is a doubly stochastic process with an underlying stochastic process that is not observable (it is hidden) • It is a state transition process described • For speech modeling applications, the HMM is a generator of vector sequences.
Hidden Markov Model (2/2) • Finite-State Machine + Probability Process
HMMs Problems • Maximizing the likelihood (ML) • estimate the parameters that guarantee convergence • Expectation–maximization (EM) • estimation with good convergence properties, although it does not guarantee finding the global maximum • Problems with an ML formulation • will never achieve perfect classification
SVM • Support Vector Classification的目標是在高維度的特徵空間中找出一個區分平面(separating hyperplanes )。而此區分平面(separating hyperplanes )可以找出最佳的邊界。 • ERM and SRM be used to find a good hyperplane • ERM: empirical risk minimization • Can be used to find a good hyperplane , although this does not guarantee a unique solution • SRM: structure risk minimization • Can help choose the best hyperplane by ordering the hyperplanes based on the margin • Real-world classification problems • ANNs • attempt overcome many of problems • Slow convergence during training and a tendency to overfit the data.
Kernels • Allow a dot product to be computed in a higher dimensional space • Linear • Polynomial • Radial basis function (RFB) • Slower than polynomial kernels but better performance • Sigmoid
One-against-all method • yi • are the class assignments • w • represents the weight vector defining the classifier, • b • is a bias term • εi • the ’s arethe slack variables.
Applications to speech recognition • Hybrid approaches • SVMs cannot model the temporal structure of speech effectively. • So, we still need use HMM structure to model temporal evolution • Use NN only to estimate posterior probabilities
Several issues arise • Posterior estimation • Segmental Modeling • N-best List Rescoring
Posterior estimation • There issignificant overlapin the feature space. • SVMs providea distance or discriminate that can beused to compare classifiers. • Main concerns in using SVMs • lack of a clearrelationshipbetween distance from the margin • the posterior classprobability • We used a sigmoid distribution tomapthe output distances to posteriors
Segmental Modeling (1/2) • At frame-level stillnot computationally feasibleto train on all data availablein the large corpora. • In our work, we have chosen to use a segment-basedapproach to avoid these issues. • Segmental data takes better advantage of the correlation in adjacent frames of speech data. • A related problem is the variable length or duration problem.
Segmental Modeling (2/2) • A simple but effective approach motivated by the three-state HMMs is to assume that the segments are composed of a fixed number of sections. • The first and third sections model the transitioninto and outof the segment • The second section models thestable portionof the segment
N-best List Rescoring • GenerateN-best lists using HMM system • Alignmentfor each hypothesis in the N-best list using the HMM system. • Segment-level feature vectors are generated from these alignments. • The N-best list is reordered based on the likelihood, andthe top hypothesisis used tocalibratethe performance of the system.
Experimental Results • The Deterding vowel data • Simple but popular static classification task • Used to benchmarknonlinearclassifiers. • Spoken Letters and Numbers • Spoken letters and long distance telephone lines. • OGI Alphadigits (AD) • Confusable for telephone-quality speech (e.g. “p” vs “b”)
Conclusions • A support vector machine as a classifier in a continuous speech recognition system. • A hybrid SVM/HMM system has been developed. • The results obtained in the experiments clearly indicate the classification power of SVMs and affirm the use of SVMs for acoustic modeling. • Further research into the segmentation issue
Personal Opinion • I need study more and more… and I wish god can give me more time