Introduction of HMM Tool Kit (HTK)

Introduction of HMM Tool Kit (HTK)

Benefits of HTK • World recognized state-of-the-art speech recognition system • Support a variety of different input formats • Support different features • Support almost all common speech recognition technologies

Detail features of HTK • HTK can support a variety of different formats ex : pcm, wav, …, ALIEN(unknown), etc. • Feature extraction: • MFCC, filterbank, PLP, LPC, …, etc. • Very free HMM definition • Training • Viterbi (segmentation) • Forward/Backward (Baun-Welch) • Single model re-estimation (change feature)

Detail features of HTK • HMM system refinement • Context-dependent model • Parameter tying/clustering • Regression class tree (MLLR) • Language • Word grammar and network • Bigram language model • Decoding • Evaluate recognition results • Forced alignment • NBest lists/lattices

Detail features of HTK • HMM adaptation • MLLR/Regression Tree • MAP • Mean/variance

HTK procedures • Data/Setting preparation • Define Acoustic units (phone table) • Define Dictionary (word) • Define grammar/network • Collect speech database • Generate transcription • Feature Extraction • Set configuration file for MFCC feature extraction • Prepare Script files (corpus file) • Define HMMs structure (prototype) • Training HMM models • Prepare Script files (corpus file) • Set configuration file for training, recognition, …,etc. • Flat start (uniform segmentation) • Viterbi search (forced alignment : segmentation ) • Recognition/Performance Evaluation • Viterbi search

HMM/Data Setting • Phone Table • Dictionary • Grammar Rule • Define HMMs

Define Acoustic Units (Phone table) • Using our traditional 100 RCD initials 40 CI finals ……

Dictionary (411 Syllable table) • Using our traditional 411 syllables (plus silence) word phones_list …… …… …...

Task Grammar Rule • Task: Phone Dialing • Dial three two six five four • Dial nine zero four one oh nine • Phone Woodland • Call Steve Young

Generate Task Grammar Rule Network • Task: free-syllable decoding • Define gram file $syllable=zhi | chi | ri | a | ……; ( SENT-START < $syllable [sil] > SENT-END ) • Parsing gram by HParse  wdnet

Free-Syllable Decoding “wdnet” Syntax : # I – Nodes # J – arcs N=? L=? # define nodes I=x W=www … # define arcs J=x S=y E=z ….. ……

Database Preparation • Collect speech data • MLF (Master Label File) • The content is in word level • Transcribe the collected speech database Corpus files (training/test set)  Script files • Change MLF into Phone level labeling • Feature Extraction (MFCC)

Word Master Label File #!MLF!# “*/4_t0062_t0062331.lab” tai yin . “*/4_t0062_t0062340.lab” . . Phone Master Label File #!MLF!# “*/4_t0062_t0062331.lab” sil t ai NULL yin sil . “*/4_t0062_t0062340.lab” . . Word/Phone-Level Transcriptions  using HLEd to transform

EX IS sil sil DE sp

Feature Extraction • HCOPY : Data Copy (with format changing)

Script files • codetr.scp source destination

# byte order NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # Waveform parameters SOURCEFORMAT=ALIEN HEADERSIZE=256 SOURCERATE=1250.0 # Coding parameters TARGETKIND=MFCC_E TARGETRATE=100000.0 SAVECOMPRESSED=F SAVEWITHCRC=T WINDOWSIZE=320000.0 # ZMEANSOURCE=T USEHAMMING=T PREEMCOEF=0.97 NUMCHANS=20 USEPOWER=F #normalized the dynamic range of MFCC CEPLIFTER=22 LOFREQ=0 HIFREQ=4000 NUMCEPS=12 ENORMALISE=T DELTAWINDOW=2 ACCWINDOW=2 Feature Extraction Configuration

HMM Configuration • Config File (command-level) Command –C config_file • User Defaults > export HCONFIG=my_HTK_config • Built-in Defaults ref Chap 18 in HTK manual

Define HMM Structure

~o <VecSize> 39 <MFCC_Z_E_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <NumMixes> 4 <Mixture> 1 0.25 <Mean> 39 …… <Variance> 39 …… <TransP> 5 0.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 <EndHMM> HMM Prototype Definition

Create HMM Prototypes

Training Procedure • Model Initialization • Flat start (unknown segmentation  uniform segmentation) • Viterbi search (given segmentation) • Forward/backward only in word level • Model Refinement • Mixture splitting

Configuration file for Training/Test # byte order #BYTEORDER=VAX NATURALREADORDER=TRUE NATURALWRITEORDER=TRUE # MFCC parameters SOURCEFORMAT=HTK SOURCERATE=100000.0 TARGETKIND=MFCC_E_D_A_Z TARGETRATE=100000.0 DELATWINDOW=2 ACCWINDOW=2

Training Corpus • Mat4500_train.scp • Mat4500_train_phones.mlf …… ……

Flat start Viterbi search Forward/Backward

Initialize HMMs from Flat Start

Initialize HMMs from Viterbi Search

Utterance Segmentation *.mlf • mat4500_train.mlf (phone-level with segmentation information) …

Silence and short pause model • sp share the middle state for silence • Sil.hed： AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3{sp.transP} TI silst {sil.state[3],sp.state[2]}

Mixture Splitting

Mixture Splitting Script • MU2.hed ……

Recognition/Evaluation Procedure Recognition Evaluation

Test Corpus • Mat4500_test.scp • Mat4500_test.mlf ……

Two Type of Result Formats

Confusion Matrix Analysis

Force Alignment • Viterbi decoding • HVite using option -a • You can get some statistics of the HMM segmentation • Useful for mixture number determined

Speaker Adaptation – MLLR, MAP • MLLR • In training phase generate the states occupation statistics % HERest –s • HHed RN “models” //ReName hmmid LS “stats” //loads states occupation statistics RC 32 “rtree” //Regression class = 32 or RC 32 “rtree” {sil.state[2-4].mix}

force alignment of adaptation data %Hvite … -a … -I adapWords.mlf -m …. • Find global MLLR %HEAdapt –C … -g … -K global.tmf …-I adapPhone.mlf …. *.tmf : transform model file • Find MLLR regression Tree] %HEAdapt –C … -J global.tmf –K rc.tmf …-I adapPhone.mlf … • Recognition %HVite … -J rc.tmf ….

MAP adaptation • HEAdapt –C … -j 0.9 …-k …-I adapPhone.mlf … -j : weight -k : using MLLR before MAP

Further topics • Model/state tying (HMM definition) • Context-dependent model • Fast training/search (Beam search) • Insertion/Deletion problem  Duration constraint  word transition penalty • Word Lattice output

Detail options for the HTK commands • HCompV • Typical arguments HCompV –C xxx –f 0.01 –m –S *.scp –M output_dir hmm • -m : update mean • -f f : set varFloor to f*global variance in hmm macro ~o … ~v “varFloor1” <Variance> 38 ………………..

Detail options for the HTK commands • HERest • Typical arguments HERest –C xxx –I *.mlf –t 250.0 150.0 1000.0 -S *.scp –H hmm_macros –H hmm_defs –M output_dir hmmlist • -t f [i l] : set the pruning threshold to f f  f+i until f=l • -T tracing option octal number, command dependent • 00020 show occupation counts

Detail options for the HTK commands • HVite • Typical arguments HVite –H hmm_macros –H hmm_defs –S *.scp –i output_mlf –w wdnet –p 0.0 –s 5.0 –t 250 dict tiedlist • -t f [i l] : set the pruning threshold to f f  f+i until f=l • -m : show model boundaries • -a : force alignment, -I input.mlf • -p, -s : word insertion penalty, weight for grammar score

Detail options for the HTK commands • HResult • Typical arguments HResult –I *.mlf hmmlist answer.mlf • -n : use NIST • -e s t : label t is made equivalent to s

Detail options for the HTK commands • HInit • Typical arguments HInit –S *.scp –M hmm_macro –H hmm_defs model • HRest • Typical arguments HRest –S *.scp –M hmm_macro –H hmm_defs model • HSLab • Use wavesufer.

Introduction of HMM Tool Kit (HTK)

Introduction of HMM Tool Kit (HTK)

Presentation Transcript

AIF Tool Kit

The HTK Book (for HTK Version 3.2.1)

OSHA’s Tool Kit

Math Tool Kit

Participant Tool Kit

Building an ASR using HTK CS4706

Pedagogical tool kit!

SMC Tool kit – introduction

Rainbow Tool Kit

Comparison of HTK and UW in Abdominal Transplantation

DSP homework 1 HMM Training and Testing

A Tutorial of HMM Tool Kit (HTK)

Reading Tool Kit

What is HTK tool kit

FORENSIC TOOL KIT

Personal Tool Kit

Marketing Tool Kit

Speech Processing Using HTK

Advisor Tool Kit

HMM Toolkit (HTK)

ACPA Tool kit