D ynamic Time Warping and Minimum Distance Paths for Speech Recognition

Dynamic Time Warping and Minimum Distance Paths for Speech Recognition • Isolated word recognition: • Task : • Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones • Method: • Record, parameterise and store vocabulary of reference words • Record test word to be recognised and parameterise • Measure distance between test word and each reference word • Choose reference word ‘closest’ to test word

Words are parameterised on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift 40ms 20ms We want to compare frames of test and reference words i.e. calculate distances between them

Calculating Distances • Easy: • Sum differences between corresponding frames • Problem: • Number of frames won’t always correspond

Solution 1: Linear Time Warping • Stretch shorter sound • Problem? • Some sounds stretch more than others

Solution 2: • Dynamic Time Warping (DTW) 5 3 9 7 3 Test 4 7 4 Reference Using a dynamic alignment, make most similar frames correspond Find distances between two utterences using these corresponding frames

Digression: Dynamic Programming • The shortest route from Dublin to Limerick goes through: • Kildare • Monasterevin • Portlaoise • Mountrath • Roscrea • Nenagh • Now consider the shortest route from Dublin to Nenagh • What towns does the route go through?

Intercity Example

Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix Compute minimum distances dist each point and place in mindist matrix: mindist(5,3) = min{1 + mindist(5,2), 1 + mindist(4,2), 1 + mindist(4,3)} Test Test Reference We can also find the path through the grid that minimizes total cost of path Reference

Examples so far are uni-dimensional Speech is multi-dimensional e.g. two dimensions, using points (4,3) and (5,2) 4 5 54321 x x 1 2 3 4 5 Distance equation for 2 dimensions: Distance equation for multi-dimensional:

Constraints • Global • Endpoint detection • Path should be close to diagonal • Local • Must always travel upwards or eastwards • No jumps • Slope weighting • Consecutive moves upwards/eastwards

Global Constraints

Local Constraints mindist(r,c) 1 mindist(r,c-1) weights 1 2 mindist(r-1,c-1) mindist(r-1,c)

Points to Note • DTW really only suitable for small vocabularies and/or speaker dependent recognition • Should normalise for reference length • Can use multiple utterances and cluster them • Poor performance if recording environment changes • High computation cost

Evaluation • Performance of designs only comparable by evaluation • Use a test set • For single word recognition we can simply quote % accuracy: • In error analysis, it can be helpful to use a confusion matrix

Confusion Matrix

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition

Presentation Transcript

Speech Recognition

Speech Recognition

Time Series and Dynamic Time Warping

Using Speech Recognition for Speech Therapy

Speech Recognition

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING

Speech recognition

Combining Speech Attributes for Speech Recognition

Speech Recognition

Speech Recognition

DTW for Speech Recognition

Speech Recognition

FTW: Fast Similarity Search under the Time Warping Distance

Dynamic Time Warping

Speech Recognition

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Real-Time Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition for Dummies