1 / 15

DTW for Speech Recognition

DTW for Speech Recognition. J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學 資工系 ). Dynamic Time Warping (DTW). Characteristics: Pattern-matching-based approach Require less memory/computation

vaughan
Download Presentation

DTW for Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DTW for Speech Recognition J.-S. Roger Jang (張智星) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab (多媒體資訊檢索實驗室) CS, Tsing Hua Univ. (清華大學 資工系)

  2. Dynamic Time Warping (DTW) • Characteristics: • Pattern-matching-based approach • Require less memory/computation • Suitable for speaker-dependent recognition • Suitable for small to medium vocabulary • Suitable for microprocessor/chip implementation • Applications • Speaker identification & verification for surveillance • Voice commands for mobile phones, toys

  3. Dynamic Time Warping: Type 1 j t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

  4. Dynamic Time Warping: Type 2 j t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

  5. Type 1 27-45-63 local paths Type 2 0-45-90 local paths Local Path Constraints

  6. Path Penalty for Type-1 DTW • Path penalty • No penalty for 45-degree path • Some penalty for paths deviated from 45-degree

  7. We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both corners are fixed. (End point detection is critical.) Suitable for voice command applications DTW Paths of “Match Corners” j i

  8. No fixed anchored positions Suitable for retrieval of personal spoken documents DTW Paths of “Match Anywhere” j i

  9. Local constraints Start/ending area Other Variants

  10. Implementation Issues • To save memory • Use 2-column table for type-1 DTW • Use 1-column table for type-2 DTW • To avoid too many if-then statements • Pad type-1 DTW with two-layer padding • Pad type-2 DTW with one-layer padding • To find a suitable path • Minimizing total distance • Minimizing average distance

  11. DTW Path of “Match Corners”

  12. DTW Path of “Match Anywhere”

  13. DTW Path of “Match Anywhere”

  14. DTW for Spoken Document Retrieval • Applications • Voice-based audio/video retrieval • Issues in SDR using DTW • Speaker normalization • Vocal track length normalization (VTLN) • Frequency warping • Efficiency

  15. DTW for Speaker-independent Voice Command Recognition • Applications • Digit recognition • Technical highlights • Extensive recordings • Clustering within each command • Some indexing methods for DTW • Suitable for small-vocabulary applications

More Related