130 likes | 237 Views
The HTK Book (for HTK Version 3.2.1). Young et al., 2002. Chapter 1 The Fundamentals of HTK. HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc.
E N D
The HTK Book (for HTK Version 3.2.1) Young et al., 2002
Chapter 1The Fundamentals of HTK • HTK is a toolkit for building hidden Markov models (HMMs). • Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. • HTK has tools (modules) for both training and testing HMM systems.
How to Train and Test an ASR? • Things needed: A labeled speech corpus and a dictionary (+ grammar). • Procedure: 1. Divide corpus into training, development and test sets. • 2. Train acoustic models. • 3. Test, retrain, test … on the development set. • 4. Test on the test data.
How to Build an ASR Using HTK? • Goal: A recognizer for voice dialing. ( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )
Creating a Dictionary HDMan a list of the phones. An HMM will be estimated for each of these phones.
Recording the Data • HSLab noname • HSGen (wdnet dict) testprompts
Transcribing the Data • HMM training is supervised learning.
Coding the Data • HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.
Output Probability Specification • Most common one is CDHMM. • HTK also allows discrete probabilities (for VQ data).
Flat Start Training • Build a prototype HMM with reasonable initial guesses of its parameters (HCompV). • Specify the topology – usually left to right and 3 states w/ no skips. • Create a MMF. • Now use HRest or HERest for training.
Realigning and Creating Triphones. • Use pseudo-recognition to force align training data w/ multiple pronunciations.
Other Issues • HTK supports supervised and unsupervised speaker adaptation (HVite). • Language model: n-gram language models.