1 / 47

Study on Speaker Recognition Based on HHT

Study on Speaker Recognition Based on HHT. 指導教授 : 謝傳璋 教授 王昭男 教授 學 生:吳明弦 日 期: 98/12/10. Outline. 一、 abstract 二、 Instantaneous frequency 三、 EMD&IMF 四、 speech signal pretreatment 五、 Vector quantization 六 、 conclusion 七、 reference. abstract.

kasi
Download Presentation

Study on Speaker Recognition Based on HHT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Study on Speaker Recognition Based on HHT 指導教授:謝傳璋 教授 王昭男 教授 學 生:吳明弦 日 期:98/12/10

  2. Outline • 一、abstract • 二、Instantaneous frequency • 三、EMD&IMF • 四、speech signal pretreatment • 五、Vector quantization • 六、conclusion • 七、reference

  3. abstract • 語者識別是一門很廣泛的學科,與心理學、訊號處理、資訊工程、語音學等息息相關,用於實現機器與人的溝通,提升識別身份的準確性。 • 語音訊號屬於非線性非平穩,傳統的傅利業分析屬於線性,需要了解希爾伯特轉換(線性及非線性),可知道頻率含量隨時間的變化。 • 另有提到經驗模態分解的概念,現實生活中,由於訊號為多頻率成份所組成,故將原始訊號分成有限個本質模態函數加一個趨勢訊號來表示原始訊號 • 希爾伯特轉換在語者識別上已有成功的應用例子,如語音訊號端點檢測、特徵提取,以便進行語者識別系統設計,達到想要的語者識別準確性,現今生活還應用在地震、軌道、財管等,貢獻良多。

  4. Speaker Recognition Process pretreatment Feature extraction Speech signal feature Speaker database Comparison with Speaker database decision Yes or no

  5. HHT Process no Trend Or constant Input data Shift process Intrinsic Mode Function (IMF) Empirical Mode Decomposition (EMD) Marginal spectrum Hilbert spectrum Hilbert transform

  6. Fourier analysis • x=0.5*sin(2*pi*15*t)+2*sin(2*pi*40*t)

  7. Analytic signal

  8. Hilbert transform

  9. Instantaneous frequency • 1.mean value=0 dt=1/400

  10. Instantaneous frequency • 2.mean value<1

  11. Instantaneous frequency • 3.mean value>1

  12. EMD Use characteristic time scales vibrate mode definition,time difference of between max and min value analyze local property。 x(t) shift process: 1.Find x(t) all local max、min value,use cubic spline hold all local max、min point link up、low envelopment。 2.Find mean of up、low envelopment again that get mean envelopment m1(t) 。 3.h1(t)= x(t)-m1(t) get first component,first shift finish,if no,keep shift second until are IMF conditions 。

  13. Shift process • 1.x(t)

  14. Shift process • 2.m1(t)h1(t)

  15. IMF shift process: 1.remove carrier wave(one mode vibrate) 2.waveform symmetry(avoid vibrate of no smooth) IMFproperty: shift process get decompose component 1. Number of local max and min value = function number ofzero crossing point,otherwise difference 1。 2. Mean value of local max and min value = 0。

  16. Hilbert Spectrum

  17. Produce of speech signal Voice (period impulse) Speech signal Vocal tract Unvoice (not period)

  18. End-point Detection throrem • 1.energy • e(i)= • Energy of voice more than unvoice, • but unvoice may have large background noise ,may see very large energy

  19. End-point Detection throrem • 2.zero crossing rate • ZCR(i)= • voice→zero crossing rate small • unvoice→ zero crossing rate large • Frame enery> ,frame index 1 , A frame of after 1 > ,after A frame may start of speech index 1,back see inside B frame < start of speech is sure index 0

  20. End-point Detection way • 1.frequency change dt=0.1

  21. End-point Detection way

  22. End-point Detection way

  23. End-point Detection way • 2.phase change dt=0.1

  24. End-point Detection way

  25. End-point Detection way

  26. Pre-emphasis& remove slience • Signal amplitude • <1/10 of Max amplitude • → slience

  27. Before pre-emphasis andafter pre-emphasis

  28. feature extraction Speaker 1 Speech Signal • hello

  29. Instantaneous frequency

  30. Instantaneous frequency

  31. Hilbert Spectrum

  32. Speaker2 Speech Signal

  33. Instantaneous frequency

  34. Instantaneous frequency

  35. Hilbert Spectrum

  36. Speaker 1 Speech Signal

  37. Instantaneous frequency

  38. Instantaneous frequency

  39. Hilbert Spectrum

  40. Pulse code modulation • 1.uniform quantization • 出處 王小川 語音訊號處理

  41. Scalar quantization • 2.non-uniform quantization • 出處 王小川 語音訊號處理

  42. Vector quantization • Mean quantization error smallest • Condition: • (1)nearest neighbor selection rule • (2)quantization value

  43. Produce of Vector quantization codebook • centroid splitting algorithm • 1.initally • All train data calculate a centroid • →initally codebook • 2.splitting • n stage splitting 2^n centroid,input data compare all centroid distance smallest • →know input data in A region, calculate centroid again,reach codebook size

  44. conclusion • 簡單介紹經驗模態分解、本質模態函數、希爾伯特頻譜、語音識別的概念,語音預處理等,目前語者識別的特徵提取方法以希爾伯特轉換為基礎,適用於非線性非平穩的語音訊號,根據所提取的特徵,可知語者何時說話,另外利用向量量化所建的語音資料庫編碼本來進行距離比較,得知是哪個語者說話,由此可知瞬時頻率的重要性

  45. reference • 1.The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis By Norden E. Huang1, Zheng Shen2, Steven R. Long3,Manli C.Wu4, Hsing H. Shih5, Quanan Zheng6, Nai-Chyuan Yen7,Chi Chao Tung8 and Henry H. Liu9 • 2. 方建、基於HHT語音識別技術研究,哈爾濱工程大學通信與信息系統研究所碩士論文,2006 • 3.許豔紅、HHT變換在說話人識別中的應用,浙江大學電子信息及技術研究所碩士論文,2005 • 4.王小川、語音訊號處理,2007

  46. next step 1.Speaker Recognition system design 2.Find speaker database

  47. Thank you

More Related