1 / 46

Introduction of HMM Tool Kit (HTK)

Introduction of HMM Tool Kit (HTK). Benefits of HTK. World recognized state-of-the-art speech recognition system Support a variety of different input formats Support different features Support almost all common speech recognition technologies. Detail features of HTK.

jjoy
Download Presentation

Introduction of HMM Tool Kit (HTK)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction of HMM Tool Kit (HTK)

  2. Benefits of HTK • World recognized state-of-the-art speech recognition system • Support a variety of different input formats • Support different features • Support almost all common speech recognition technologies

  3. Detail features of HTK • HTK can support a variety of different formats ex : pcm, wav, …, ALIEN(unknown), etc. • Feature extraction: • MFCC, filterbank, PLP, LPC, …, etc. • Very free HMM definition • Training • Viterbi (segmentation) • Forward/Backward (Baun-Welch) • Single model re-estimation (change feature)

  4. Detail features of HTK • HMM system refinement • Context-dependent model • Parameter tying/clustering • Regression class tree (MLLR) • Language • Word grammar and network • Bigram language model • Decoding • Evaluate recognition results • Forced alignment • NBest lists/lattices

  5. Detail features of HTK • HMM adaptation • MLLR/Regression Tree • MAP • Mean/variance

  6. HTK procedures • Data/Setting preparation • Define Acoustic units (phone table) • Define Dictionary (word) • Define grammar/network • Collect speech database • Generate transcription • Feature Extraction • Set configuration file for MFCC feature extraction • Prepare Script files (corpus file) • Define HMMs structure (prototype) • Training HMM models • Prepare Script files (corpus file) • Set configuration file for training, recognition, …,etc. • Flat start (uniform segmentation) • Viterbi search (forced alignment : segmentation ) • Recognition/Performance Evaluation • Viterbi search

  7. HMM/Data Setting • Phone Table • Dictionary • Grammar Rule • Define HMMs

  8. Define Acoustic Units (Phone table) • Using our traditional 100 RCD initials 40 CI finals ……

  9. Dictionary (411 Syllable table) • Using our traditional 411 syllables (plus silence) word phones_list …… …… …...

  10. Task Grammar Rule • Task: Phone Dialing • Dial three two six five four • Dial nine zero four one oh nine • Phone Woodland • Call Steve Young

  11. Generate Task Grammar Rule Network • Task: free-syllable decoding • Define gram file $syllable=zhi | chi | ri | a | ……; ( SENT-START < $syllable [sil] > SENT-END ) • Parsing gram by HParse  wdnet

  12. Free-Syllable Decoding “wdnet” Syntax : # I – Nodes # J – arcs N=? L=? # define nodes I=x W=www … # define arcs J=x S=y E=z ….. ……

  13. Database Preparation • Collect speech data • MLF (Master Label File) • The content is in word level • Transcribe the collected speech database Corpus files (training/test set)  Script files • Change MLF into Phone level labeling • Feature Extraction (MFCC)

  14. Word Master Label File #!MLF!# “*/4_t0062_t0062331.lab” tai yin . “*/4_t0062_t0062340.lab” . . Phone Master Label File #!MLF!# “*/4_t0062_t0062331.lab” sil t ai NULL yin sil . “*/4_t0062_t0062340.lab” . . Word/Phone-Level Transcriptions  using HLEd to transform

  15. EX IS sil sil DE sp

  16. Feature Extraction • HCOPY : Data Copy (with format changing)

  17. Script files • codetr.scp source destination

  18. # byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0 # ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F #normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2 Feature Extraction Configuration

  19. HMM Configuration • Config File (command-level) Command –C config_file • User Defaults > export HCONFIG=my_HTK_config • Built-in Defaults ref Chap 18 in HTK manual

  20. Define HMM Structure

  21. ~o <VecSize> 39 <MFCC_Z_E_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <NumMixes> 4 <Mixture> 1 0.25 <Mean> 39 …… <Variance> 39 …… <TransP> 5 0.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 <EndHMM> HMM Prototype Definition

  22. Create HMM Prototypes

  23. Training Procedure • Model Initialization • Flat start (unknown segmentation  uniform segmentation) • Viterbi search (given segmentation) • Forward/backward only in word level • Model Refinement • Mixture splitting

  24. Configuration file for Training/Test # byte order #BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0 TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2

  25. Training Corpus • Mat4500_train.scp • Mat4500_train_phones.mlf …… ……

  26. Flat start Viterbi search Forward/Backward

  27. Initialize HMMs from Flat Start

  28. Initialize HMMs from Viterbi Search

  29. Utterance Segmentation *.mlf • mat4500_train.mlf (phone-level with segmentation information) …

  30. Silence and short pause model • sp share the middle state for silence • Sil.hed: AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3{sp.transP} TI silst {sil.state[3],sp.state[2]}

  31. Mixture Splitting

  32. Mixture Splitting Script • MU2.hed ……

  33. Recognition/Evaluation Procedure Recognition Evaluation

  34. Test Corpus • Mat4500_test.scp • Mat4500_test.mlf ……

  35. Two Type of Result Formats

  36. Confusion Matrix Analysis

  37. Force Alignment • Viterbi decoding • HVite using option -a • You can get some statistics of the HMM segmentation • Useful for mixture number determined

  38. Speaker Adaptation – MLLR, MAP • MLLR • In training phase generate the states occupation statistics % HERest –s • HHed RN “models” //ReName hmmid LS “stats” //loads states occupation statistics RC 32 “rtree” //Regression class = 32 or RC 32 “rtree” {sil.state[2-4].mix}

  39. force alignment of adaptation data %Hvite … -a … -I adapWords.mlf -m …. • Find global MLLR %HEAdapt –C … -g … -K global.tmf …-I adapPhone.mlf …. *.tmf : transform model file • Find MLLR regression Tree] %HEAdapt –C … -J global.tmf –K rc.tmf …-I adapPhone.mlf … • Recognition %HVite … -J rc.tmf ….

  40. MAP adaptation • HEAdapt –C … -j 0.9 …-k …-I adapPhone.mlf … -j : weight -k : using MLLR before MAP

  41. Further topics • Model/state tying (HMM definition) • Context-dependent model • Fast training/search (Beam search) • Insertion/Deletion problem  Duration constraint  word transition penalty • Word Lattice output

  42. Detail options for the HTK commands • HCompV • Typical arguments HCompV –C xxx –f 0.01 –m –S *.scp –M output_dir hmm • -m : update mean • -f f : set varFloor to f*global variance in hmm macro ~o … ~v “varFloor1” <Variance> 38 ………………..

  43. Detail options for the HTK commands • HERest • Typical arguments HERest –C xxx –I *.mlf –t 250.0 150.0 1000.0 -S *.scp –H hmm_macros –H hmm_defs –M output_dir hmmlist • -t f [i l] : set the pruning threshold to f f  f+i until f=l • -T tracing option octal number, command dependent • 00020 show occupation counts

  44. Detail options for the HTK commands • HVite • Typical arguments HVite –H hmm_macros –H hmm_defs –S *.scp –i output_mlf –w wdnet –p 0.0 –s 5.0 –t 250 dict tiedlist • -t f [i l] : set the pruning threshold to f f  f+i until f=l • -m : show model boundaries • -a : force alignment, -I input.mlf • -p, -s : word insertion penalty, weight for grammar score

  45. Detail options for the HTK commands • HResult • Typical arguments HResult –I *.mlf hmmlist answer.mlf • -n : use NIST • -e s t : label t is made equivalent to s

  46. Detail options for the HTK commands • HInit • Typical arguments HInit –S *.scp –M hmm_macro –H hmm_defs model • HRest • Typical arguments HRest –S *.scp –M hmm_macro –H hmm_defs model • HSLab • Use wavesufer.

More Related