핵심어 검출을 위한 단일 끝점 DTW 알고리즘

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology

Contents • Keyword Spotting • Meaning & Necessity • Problems • Dynamic Time Warping (DTW) • Advantages of DTW • Some conventional types & Proposed DTW type • Experimental Results • Verification of proposed DTW performance • Standard threshold setting • Results of various conditions • Conclusions

Keyword Spotting • Meaning • Detection of pre-defined keywords in the continuous speech • Example) • Keywords : ‘open’, ‘window’ • Input : “um…okay, uh… pleaseopenthe…uh…window” • Necessity • Human may say OOV(Out Of Vocabulary), sometimes stammer • But machine only needs some specific words for recognition

Problems & Goal • Difficulties • of process • End-Point-Detection of speech segment • Rejection of OOVs • of implementation • A big load of calculations • Complex algorithm • Hard to build up a real hardware system • Goal • Simple & Fast Algorithm

DTW for Keyword Spotting • Hidden Markov Model (HMM) • A statistical model : need large number of datum for training • Complex algorithm : hard to implement a hardware system • Many parameters : can cause memory problem • Dynamic Time Warping (DTW) • Advantages • Small number of datum for training • Simple algorithm (addition & multiplication) • Small number of stored datum • Weak points • Need EPD process, Many calculations

General DTW Process • Known both End Points • Repetition of searches • Finding corresponding frames

Advanced DTW • Myers, Rabiner and Rosenberg • No EPD Process • Series of small area searches • Global search in one area • Setting next area around the best match point of local area • Reducing amount of calculations but still much • Tested in isolated word recognition

Proposal – Shape & Weights • No EPD process • Only one path • Select the best match point and search again at the point • Less computations • Modifying weights • To compensate weight-sum differences • For search • For distance accumulation

Proposal – End Point • Small search area • Successive local searches • Start search at one point • End condition • When the point is on the last frame of Ref. pattern • Setting up End Point automatically

Proposal – Distance • Modifying distance • Using differences of pattern lengths • Pattern lengths of same words are similar each other

DTW – Computation Loads • 3 types

Data Base & EX-SET • DB • RoadRally • For keyword spotting • Based on telephone channel • Usages • 11 keywords (Total 434 occurrences) • 40 male speakers read speech (Total 47 min.) in Stonehenge • SET construction • 4 sub-set (about 108 keywords / set) • 3 set for training , 1 set for test • 2 reference patterns / keyword / set

Verification Result • Isolated Word Recognition • 3 set for training , 1 set for test

Experimental Setup • Assumption • Any frame can be the last frame of keywords • Threshold • To reject OOV • 1 threshold / ref. • Standard threshold : no false alarm in training set • Result presentation • ROC (Receiver Operator Characteristic) • X-axis : false alarm / hour / keyword • Y-axis : recognition rate

Thresholds Setting & Recognition Rate of Training Set • Training set = Test set (No false alarm)

Result – DTW & HMM • ROC Curve

Changing Conditions No. of Keywords No. of References

Conclusion • Proposed DTW • Advantages • Simple structure : addition & multiplication (good for hardware) • No EPD processing • Very small computation load • Small stored datum : small memory • Only keyword information • Good performance • Keyword Spotting • Better than HMM in the case of small training datum

핵심어 검출을 위한 단일 끝점 DTW 알고리즘

핵심어 검출을 위한 단일 끝점 DTW 알고리즘

Presentation Transcript