1 / 13

The HTK Book (for HTK Version 3.2.1)

The HTK Book (for HTK Version 3.2.1). Young et al., 2002. Chapter 1 The Fundamentals of HTK. HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc.

carla-oneil
Download Presentation

The HTK Book (for HTK Version 3.2.1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The HTK Book (for HTK Version 3.2.1) Young et al., 2002

  2. Chapter 1The Fundamentals of HTK • HTK is a toolkit for building hidden Markov models (HMMs). • Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. • HTK has tools (modules) for both training and testing HMM systems.

  3. How to Train and Test an ASR? • Things needed: A labeled speech corpus and a dictionary (+ grammar). • Procedure: 1. Divide corpus into training, development and test sets. • 2. Train acoustic models. • 3. Test, retrain, test … on the development set. • 4. Test on the test data.

  4. How to Build an ASR Using HTK? • Goal: A recognizer for voice dialing. ( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )

  5. Creating a Dictionary HDMan a list of the phones. An HMM will be estimated for each of these phones.

  6. Recording the Data • HSLab noname • HSGen (wdnet dict) testprompts

  7. Transcribing the Data • HMM training is supervised learning.

  8. Coding the Data • HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.

  9. Output Probability Specification • Most common one is CDHMM. • HTK also allows discrete probabilities (for VQ data).

  10. Flat Start Training • Build a prototype HMM with reasonable initial guesses of its parameters (HCompV). • Specify the topology – usually left to right and 3 states w/ no skips. • Create a MMF. • Now use HRest or HERest for training.

  11. Realigning and Creating Triphones. • Use pseudo-recognition to force align training data w/ multiple pronunciations.

  12. Evaluation

  13. Other Issues • HTK supports supervised and unsupervised speaker adaptation (HVite). • Language model: n-gram language models.

More Related