470 likes | 502 Views
Introduction of HMM Tool Kit (HTK). Benefits of HTK. World recognized state-of-the-art speech recognition system Support a variety of different input formats Support different features Support almost all common speech recognition technologies. Detail features of HTK.
E N D
Benefits of HTK • World recognized state-of-the-art speech recognition system • Support a variety of different input formats • Support different features • Support almost all common speech recognition technologies
Detail features of HTK • HTK can support a variety of different formats ex : pcm, wav, …, ALIEN(unknown), etc. • Feature extraction: • MFCC, filterbank, PLP, LPC, …, etc. • Very free HMM definition • Training • Viterbi (segmentation) • Forward/Backward (Baun-Welch) • Single model re-estimation (change feature)
Detail features of HTK • HMM system refinement • Context-dependent model • Parameter tying/clustering • Regression class tree (MLLR) • Language • Word grammar and network • Bigram language model • Decoding • Evaluate recognition results • Forced alignment • NBest lists/lattices
Detail features of HTK • HMM adaptation • MLLR/Regression Tree • MAP • Mean/variance
HTK procedures • Data/Setting preparation • Define Acoustic units (phone table) • Define Dictionary (word) • Define grammar/network • Collect speech database • Generate transcription • Feature Extraction • Set configuration file for MFCC feature extraction • Prepare Script files (corpus file) • Define HMMs structure (prototype) • Training HMM models • Prepare Script files (corpus file) • Set configuration file for training, recognition, …,etc. • Flat start (uniform segmentation) • Viterbi search (forced alignment : segmentation ) • Recognition/Performance Evaluation • Viterbi search
HMM/Data Setting • Phone Table • Dictionary • Grammar Rule • Define HMMs
Define Acoustic Units (Phone table) • Using our traditional 100 RCD initials 40 CI finals ……
Dictionary (411 Syllable table) • Using our traditional 411 syllables (plus silence) word phones_list …… …… …...
Task Grammar Rule • Task: Phone Dialing • Dial three two six five four • Dial nine zero four one oh nine • Phone Woodland • Call Steve Young
Generate Task Grammar Rule Network • Task: free-syllable decoding • Define gram file $syllable=zhi | chi | ri | a | ……; ( SENT-START < $syllable [sil] > SENT-END ) • Parsing gram by HParse wdnet
Free-Syllable Decoding “wdnet” Syntax : # I – Nodes # J – arcs N=? L=? # define nodes I=x W=www … # define arcs J=x S=y E=z ….. ……
Database Preparation • Collect speech data • MLF (Master Label File) • The content is in word level • Transcribe the collected speech database Corpus files (training/test set) Script files • Change MLF into Phone level labeling • Feature Extraction (MFCC)
Word Master Label File #!MLF!# “*/4_t0062_t0062331.lab” tai yin . “*/4_t0062_t0062340.lab” . . Phone Master Label File #!MLF!# “*/4_t0062_t0062331.lab” sil t ai NULL yin sil . “*/4_t0062_t0062340.lab” . . Word/Phone-Level Transcriptions using HLEd to transform
EX IS sil sil DE sp
Feature Extraction • HCOPY : Data Copy (with format changing)
Script files • codetr.scp source destination
# byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0 # ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F #normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2 Feature Extraction Configuration
HMM Configuration • Config File (command-level) Command –C config_file • User Defaults > export HCONFIG=my_HTK_config • Built-in Defaults ref Chap 18 in HTK manual
~o <VecSize> 39 <MFCC_Z_E_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <NumMixes> 4 <Mixture> 1 0.25 <Mean> 39 …… <Variance> 39 …… <TransP> 5 0.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 <EndHMM> HMM Prototype Definition
Training Procedure • Model Initialization • Flat start (unknown segmentation uniform segmentation) • Viterbi search (given segmentation) • Forward/backward only in word level • Model Refinement • Mixture splitting
Configuration file for Training/Test # byte order #BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0 TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2
Training Corpus • Mat4500_train.scp • Mat4500_train_phones.mlf …… ……
Flat start Viterbi search Forward/Backward
Utterance Segmentation *.mlf • mat4500_train.mlf (phone-level with segmentation information) …
Silence and short pause model • sp share the middle state for silence • Sil.hed: AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3{sp.transP} TI silst {sil.state[3],sp.state[2]}
Mixture Splitting Script • MU2.hed ……
Recognition/Evaluation Procedure Recognition Evaluation
Test Corpus • Mat4500_test.scp • Mat4500_test.mlf ……
Force Alignment • Viterbi decoding • HVite using option -a • You can get some statistics of the HMM segmentation • Useful for mixture number determined
Speaker Adaptation – MLLR, MAP • MLLR • In training phase generate the states occupation statistics % HERest –s • HHed RN “models” //ReName hmmid LS “stats” //loads states occupation statistics RC 32 “rtree” //Regression class = 32 or RC 32 “rtree” {sil.state[2-4].mix}
force alignment of adaptation data %Hvite … -a … -I adapWords.mlf -m …. • Find global MLLR %HEAdapt –C … -g … -K global.tmf …-I adapPhone.mlf …. *.tmf : transform model file • Find MLLR regression Tree] %HEAdapt –C … -J global.tmf –K rc.tmf …-I adapPhone.mlf … • Recognition %HVite … -J rc.tmf ….
MAP adaptation • HEAdapt –C … -j 0.9 …-k …-I adapPhone.mlf … -j : weight -k : using MLLR before MAP
Further topics • Model/state tying (HMM definition) • Context-dependent model • Fast training/search (Beam search) • Insertion/Deletion problem Duration constraint word transition penalty • Word Lattice output
Detail options for the HTK commands • HCompV • Typical arguments HCompV –C xxx –f 0.01 –m –S *.scp –M output_dir hmm • -m : update mean • -f f : set varFloor to f*global variance in hmm macro ~o … ~v “varFloor1” <Variance> 38 ………………..
Detail options for the HTK commands • HERest • Typical arguments HERest –C xxx –I *.mlf –t 250.0 150.0 1000.0 -S *.scp –H hmm_macros –H hmm_defs –M output_dir hmmlist • -t f [i l] : set the pruning threshold to f f f+i until f=l • -T tracing option octal number, command dependent • 00020 show occupation counts
Detail options for the HTK commands • HVite • Typical arguments HVite –H hmm_macros –H hmm_defs –S *.scp –i output_mlf –w wdnet –p 0.0 –s 5.0 –t 250 dict tiedlist • -t f [i l] : set the pruning threshold to f f f+i until f=l • -m : show model boundaries • -a : force alignment, -I input.mlf • -p, -s : word insertion penalty, weight for grammar score
Detail options for the HTK commands • HResult • Typical arguments HResult –I *.mlf hmmlist answer.mlf • -n : use NIST • -e s t : label t is made equivalent to s
Detail options for the HTK commands • HInit • Typical arguments HInit –S *.scp –M hmm_macro –H hmm_defs model • HRest • Typical arguments HRest –S *.scp –M hmm_macro –H hmm_defs model • HSLab • Use wavesufer.