1 / 24

Advanced Microphone Array and ASR Integration

Advanced Microphone Array and ASR Integration. Professor: Yuan-Fu Liao. National Taipei University of Technology. Overview. Introduction Microphone Array and ASR Integration Noise - Phase Error Filtering Maximum Likelihood-based Integration Maximum Classification Error-like Integration

luann
Download Presentation

Advanced Microphone Array and ASR Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Microphone Array and ASR Integration Professor: Yuan-Fu Liao National Taipei University of Technology

  2. Overview • Introduction • Microphone Array and ASR Integration • Noise - Phase Error Filtering • Maximum Likelihood-based Integration • Maximum Classification Error-like Integration • Reverberation - Subband Filtering-and-Sum • Maximum Likelihood-based Integration • Maximum Classification Error-based Integration • Summary 建議字型:中文微軟正黑體,英文Arial

  3. Traditional Beamforming+ASR • Pipeline : first enhance speech with beamformer, then feed into recogniser

  4. Bridge the Gap between Array and Speech Recognizer • Take the advantage of available a priori knowledge, i.e., the underline recognition model • Directly feed the output of recognizer back to microphone array 建議字型:中文微軟正黑體,英文Arial

  5. References • Noise - dual-microphone phase error filtering • Shi, G., Aarabi, P. and Jiang, H., “Phase-Based Dual-Microphone Speech Enhancement Using A Prior Speech Model”, IEEE Trans. Audio Speech Lang. Process., 15:109-118, 2007. • C. Kim, K. Kumar, B. Raj, and R. M. Stern, “Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain,” In INTERSPEECH-2009, pp. 2495-2498, 2009. • Hsien-Cheng Liao, Yuan-Fu Liao and Chin-Hui Lee, Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition, InterSpeech 2011 • Reverberation - subband filtering-and-sum • M.L. Seltzer, B. Raj, R.M. Stern, “Likelihood-maximizing beamforming for robust hands-free speech recognition,” IEEE Trans. Speech, and Audio Processing, vol. 12, no. 5, pp. 489–498, Sep. 2004. • M.L. Seltzer, R.M. Stern, “Subband likelihood-maximizing beamforming for speech Recognition in Reverberant Environments,” IEEE Trans. Speech, and Audio Processing, vol. 14, no. 6, pp. 2109–2121, Nov. 2006. • Yuan-Fu Liao, I-Yun Xu: Subband minimum classification error beamforming for speech recognition in reverberant environments, ICASSP‘2010 建議字型:中文微軟正黑體,英文Arial

  6. Signal Modeling(ITD) sampling rate: 8000Hz interaural time delay sound source △t 0.05 x cos (Φ) sound source Φ 0.05 m mic2 mic1 mic2 mic1 0.05 m 建議字型:中文微軟正黑體,英文Arial

  7. Binary Masking 保留 去除 speaker interference FFT ITD < τ masking micR micL ITD > τ 建議字型:中文微軟正黑體,英文Arial

  8. 短時距傅立葉轉換 X-score 計算模組 雙耳時間差 計算模組 門檻值 調整模組 特徵向量 計算模組 語音命令模型 模型N+1 最大 X-score Optimal τestimation 語音辨識 至少一個 一階段 左麥克風訊號 右麥克風訊號 X-score輸出 門檻值輸入 yes no 輸出辨識結果/門檻值 自動 建議字型:中文微軟正黑體,英文Arial

  9. Testing Database • 轉錄雙麥克風音檔錄音環境設定 • 無響室:5X4X3 m3 • 麥克風位置:無響室正中央 • 雙麥克風距離:5cm • 麥克風高度:1 m • 目標音源與雙麥克風中心距離:30cm • Babble雜訊音源角度:30o & 60o • 測試語料 • 50 commands (e.g. 向前、後退…) • 11 speakers (6 males & 5 females) • 547 utterances in total • Noise added artificially • SNR : 0,6,12,18 dB 建議字型:中文微軟正黑體,英文Arial

  10. Recognition Model • Training Data • MAT2000 DB4 • Feature • 25 ms/frame without overlap • 13 Dims(8 ceps, 4 delta ceps, dC0) • Recognition Model • 100 2-state RCD Initials + 38 2-state CI Finals • 2 mixture Gaussians/state 建議字型:中文微軟正黑體,英文Arial

  11. Performance of online τ estimation 30o db 60o db 建議字型:中文微軟正黑體,英文Arial

  12. Reverberation - Subband Filtering-and-Sum • Introduction • Maximum Likelihood-based Integration • Maximum Classification Error-based Integration 建議字型:中文微軟正黑體,英文Arial

  13. Introduction Reverberant Model Noise Free Model in Time Domain 建議字型:中文微軟正黑體,英文Arial

  14. Speech Reverberation -Time Domain 建議字型:中文微軟正黑體,英文Arial

  15. Speech Reverberation -Frequency Domain Clean Speech Noisy Speech 建議字型:中文微軟正黑體,英文Arial

  16. Basic idea of LiMaBeam Iterative procedure, utterance-based: • Do beamforming • Decode the utterance • Given most likely HMM state sequence, optimise the beamformer parameters for this sequence • Stop when likelihood has converged

  17. Subband Likelihood-Maximizing Beamforming 建議字型:中文微軟正黑體,英文Arial

  18. Formulation 建議字型:中文微軟正黑體,英文Arial

  19. Subband Minimum Classification ErrorBeamforming

  20. MCE CRITERION

  21. TCC300 Reverberation Experiment • Experimental Setting • Microphone array with 7 microphones, 5.66 cm between two microphones • Speaker 2m away from the array • Room reverberation time T60=0.3~1.3 sec. • TCC300 database, 29 speakers, each with 5 calibration and 10 test utterances • Evaluation with free-syllable decoding/syllable error rate (no language model) • Experimental Results 建議字型:中文微軟正黑體,英文Arial

  22. Typical Spectrum Examples Clean Speech Noisy Speech Delay-and-Sum MCE beamformer 建議字型:中文微軟正黑體,英文Arial

  23. Summary • Take the advantage of available a priori knowledge, i.e., the underline recognition model • Directly feed the output of recognizer back to microphone array • Error rate criterion is better than likelihood 建議字型:中文微軟正黑體,英文Arial

  24. -End-Thanks for your attention!!

More Related