1 / 23

“Hidden Markov Models for Speech Recognition” by B.H. Juang and L.R. Rabiner

Explore the evolution of Automated Speech Recognition (ASR) from the 1950s to present, focusing on the use of Hidden Markov Models. Learn about the signal processing, feature extraction, and detection engines involved in ASR systems.

rosettag
Download Presentation

“Hidden Markov Models for Speech Recognition” by B.H. Juang and L.R. Rabiner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Hidden Markov Models for Speech Recognition” byB.H. Juang and L.R. Rabiner Papers We Love Bucharest Stefan Alexandru Adam 9st of November 2015 TechHub

  2. The history of Automated Speech Recognition(ASR) • 1950-1960 Baby Talk (only digits) • 1970 Speech Recognition Takes Off (1011 words) • 1980 Big Revolution - Prediction based approach • Discovery the use of HMM in Speech Recognition • See Automatic Speech Recognition—A Brief History of the Technology Development by B.H. Juang and Lawrence R. Rabiner • 1990s: Automatic Speech Recognition Comes to the Masses • 2000 computer toped out 80% accuracy • Now Siri, Google Voice, Cortana

  3. Speech signal • The voice band is 0-4000 Hz • In telephony 300 – 3400 Hz • The sampling according to Nyquist –Shannon sampling theorem should be at least 8000 Hz (twice as signal frequency) • Phoneme • The smallest unit of a phonetic language

  4. ASR Model Preprocessing Features Extraction Detection Engine Words Audio Signal Features Vector Voice signal

  5. ASR Model – Preprocessing Noise Cancelation Voice Activity Detection ( = 0.97) Yes Audio Signal Pre-emphasis No It is assumed that the initial part of the signal (usually 200 ms) is noise and silence. , where is equal with computed for the first 200 ms. The usually depends on signal energy, power, zero crossing rate

  6. ASR Model – Features Extraction. Frame Blocking and Windowing Frame Blocking Windowing Each frame is K samples long and overlaps the previous one with P samples In order to remove discontinuities a Hamming window is applied p K

  7. ASR – Model – Features Extraction Extract the sensitive information from the signal • This information is presented as a double vector There are three main approaches • Linear Prediction Coding (LPC) • Mel Frequency Cepstral Coefficient (MFCC) • Perceptual Linear Prediction (PLP)

  8. ASR – Model – Features Extraction - LPC The idea is to determine the transfer function of the sound. Given a speech sample at time , can be modelled as a linear combination of the past p speech samples. Make Autocorrelation Levinson Durbin usually p =13

  9. ASR – Model – Features Extraction - MFCC • Mel scale • Is a perceptual scale of pitches judged by listener to be equal in distance from one another • Converts hertz into mel.

  10. ASR – Model – Features Extraction – MFCCMel Filter banks Steps • Establish a lower and a higher frequency (typically 0 - 4000 Hz) • Convert this interval to a mel interval • Divide this interval in equal parts (typically 26 - 40 filter banks) • Convert back to hertz • Define the filter bank function

  11. ASR – Model – Features Extraction – MFCCMel Filter banks

  12. ASR – Model – Features Extraction - MFCC Features Extraction Flow Mel scaled Filter bank Keep only first values - Signal Energy E, and coefficients are usually included in the output

  13. ASR – Model – Detection Engine • Pattern matching • Hidden Markov Models • Neural Networks

  14. Markov Model Markov Model is used to model randomly changing systems where future states depend only on the present state. is called a Markov Process S = set of states Π = initial state probabilities: A = transition probabilities:

  15. Hidden Markov Model V = output alphabet = {v1,v2,…,vm} B = output emission probabilities: Notice that B doesn’t depend on time. The states of the system are hidden and we observe only the emissions.

  16. Hidden Markov Model t = 1; start in state with probability (i.e., ); forever do move from state to state with prob. (i.e., ); emit observation symbol with probability; t = t + 1;

  17. Hidden Markov Model Questions • Given the current and lastt-iemissions, whatis the probability of the system beeing in state? • Knowing the current state what is the probability to observe the following n emissions? • Knowing the emissions sequence what is the most likely states sequence? • Knowing the emissions sequence what is the probability of the system to be in state where k<n?

  18. Hidden Markov Model For each question there is a separated dynamic programming algorithm. • Forward Algorithm (recursive computation) • Backward Algorithm (recursive computation) • Viterbi algorithm • Forward – Backward Algorithm based on 1 and 2 • Baum-Welch Algorithm based on 4

  19. ASR – Model – HMM Detection Engine {} – training features Vector Quantization Learning Discrete HMM Codebook Word/phoneme Detecting

  20. ASR – Model – HMM Detection EngineVector Quantization • Apply vector quantization and define the Codebook • Usually the length of the Codebook is several hundreds • Apply K-Means algorithm

  21. ASR – Model – HMM Detection EngineTraining Given a HMM Record samples for a specific word or phoneme. Using the quantized features extracted train the model using Viterbi or Baum-Welch. - quantized features play the role of an Observations Sequence - usually the number of training samples is 10. The samples can be grouped in two categories (man and woman). Label (tag) the model with the corresponding word or phoneme.

  22. ASR – Model – HMM Detection EngineDetection Given a quantized features sequence iterates throw each existing HMM and find out which has the maximum probability. Detection properties - Detection is very fast (also easy to parallelize) - Accuracy 80 % in the base implementation - Limitations – doesn’t take in consideration observation duration

  23. ASR – Model – HMM Detection EngineDemo • https://github.com/AdamStefan/Speech-Recognition

More Related