1 / 32

Phone Boundary Detection using Sample-based Acoustic Parameters

Phone Boundary Detection using Sample-based Acoustic Parameters. Yih-Ru Wang Institute of Communication Engineering, National Chiao Tung University, Hsinchu, Taiwan, ROC. Outline. Motivation, Background Why sample-based? Sample-based Acoustic Parameters & Phone Boundary Detector

kasia
Download Presentation

Phone Boundary Detection using Sample-based Acoustic Parameters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phone Boundary Detection using Sample-based Acoustic Parameters Yih-Ru Wang Institute ofCommunication Engineering, National Chiao Tung University, Hsinchu, Taiwan, ROC 2011/7/12 NGASR研討會

  2. Outline • Motivation, Background • Why sample-based? • Sample-based Acoustic Parameters & Phone Boundary Detector • Experimental results   • Conclusions and Future works 2011/7/12 NGASR研討會

  3. Motivation • Find the synchronous “clock” for Detection-based ASR, Computer Aided Language Learning(CALL) System Segment-based system Speech signal Speech Attribution Detectors Synchronous “clock” for the system Detection-based ASR, CALL system Phone Boundary Detector 2011/7/12 NGASR研討會

  4. Background • Tasks of Phonetic Segmentation • Phone alignment, 87% inclusion rate for 10 msec tolerance for experts • Phone boundary detection • Phone alignment : using Model-based method • HMM, MBE-HMM (Minimum Boundary Error HMM), HMM + fine tuning using SVM, … • Phone boundary detection : using Metric-based method • ameasure of speech signal change • norm of delta MFCC feature vector (Rabiner, 2006) • KL distance or BIC of speech signal • The frame-based features, like MFCC, were used 2011/7/12 NGASR研討會

  5. Why sample-based? • Transient vs. Stationary • Accuracy and precision • especially for ‘short’ phones, e.g. plosives • Acoustic feature used high frequency resolution, like MFCC  to ‘recognize’ phones in speech • To detect the pronunciation manner/position (acoustics) changes in speech signal • increase time resolution and decrease frequency resolution of the features 2011/7/12 NGASR研討會

  6. To find the useful measures of speech signal change in sample-based system • Sample-based Acoustic Parameters were proposed • PROs of sample-based method • Better accuracy and precision • Properly detect the boundary of short phones • CONs of sample-based method • Complexity of system? • Higher false alarm? 2011/7/12 NGASR研討會

  7. Sample-based Acoustic Parameters & Phone boundary detector • Sub-band signal envelope • Six sub-bands used for landmark detection (Liu, 1996) • ROR (rate of raising) of Sub-band signal envelope • The delta-term of a feature Bandpass freq. 5.0 – 8.0 k Hz 3.5 – 5.0 k Hz 2.0 – 3.5 k Hz 1.5 – 2.0 k Hz 0.8 – 1.2 k Hz 0.0 – 0.4 k Hz 2011/7/12 NGASR研討會

  8. |Stop |Glide|Vowel |Nasal |Vowel |Fricative |Fricative |Vowel |Nasal |Vowel |Silence Waveform Envelope Sub-band signal envelope 5.0 – 8.0 k Hz TIMIT: FDRW0/sx293 Please take this dirty table cloth to the cleaners for me 0.0 – 0.4 k Hz 2011/7/12 NGASR研討會

  9. ~20ms Please take this dirty cloth… RORof signal envelope RORof Sub-band signal envelope 2011/7/12 NGASR研討會

  10. Norm of sub-band signal envelopes can be a useful measure of signal change • Sample-based spectral entropy can be defined as where is the i-th normalized sub-band signal envelope • Sample-based spectral KL distance between speech signals at two adjacent times [n, n +1] can be defined as 2011/7/12 NGASR研討會

  11. An example of sample-based spectral entropy and its ROR Sample-based Spectral entropy ROR of Spectral entropy 2011/7/12 NGASR研討會

  12. An Example of sample-based spectral KL distance Sample-based spectral KL distance It can be used to find the signal change points moreaccurately and precisely. 2011/7/12 NGASR研討會

  13. A MLP was used as the Phone Boundary detector • The block diagram of proposed training/test procedure 2011/7/12 NGASR研討會

  14. Candidates Pre-selection – find all the speech samples, with index n, which satisfied • Pre-selection can be used to reduce the complexity and FAof sample-based system. • After candidate pre-selection, a MLP was used as the boundary detector 2011/7/12 NGASR研討會

  15. The AP features used for MLP detector 2011/7/12 NGASR研討會

  16. Iterative training procedure 2011/7/12 NGASR研討會

  17. 2nd stage : • use similarity measure of segmental acoustic signals • Using GMM to model the pdf of a speech segment • The KL1 distance of CCGMM (Wang, 2004) Using a common GMM to represent the pdfs of two segments 2011/7/12 NGASR研討會

  18. Similarity measure of two speech segments: • Discrete KL-1 distance of CCGMM coefficient • Discrete KL-2 distance using CCGMM coefficient 2011/7/12 NGASR研討會

  19. Discrete KL-1 distance is the mean of log-likelihood of two pdfs • The similarity of two pdfs • Find high order statistics of log-likelihood pdfs (Wang, 2008) • Variance, skewness of log-likelihood pdfs 2011/7/12 NGASR研討會

  20. use segmental similarity 2011/7/12 NGASR研討會

  21. Experimental Results • Database : TIMIT. • After candidates pre-selection, • 1 over 116 samples was selected • 0.9% MD due to candidate pre-selection • Performance of MLP boundary detector: 2011/7/12 NGASR研討會

  22. Performance of the sample-based boundary detector 2011/7/12 NGASR研討會

  23. An example of proposed phone boundary detector 2011/7/12 NGASR研討會

  24. Accuracy of the sample-based boundary detector 2011/7/12 NGASR研討會

  25. Compare to Dr. Rabiner’s work [2006] : Dr. Rabiner’s result : (22.8%59.2%) 2011/7/12 NGASR研討會

  26. Error analysis – MAE of detected boundary • Overall : 7.6/12.4 • Sample-based/HMM system (unit ms) • * no. of sample less than 100 2011/7/12 NGASR研討會

  27. Accuracy of proposed method – 2011/7/12 NGASR研討會

  28. Error analysis (1 stage) – MDR and FAR 2011/7/12 NGASR研討會

  29. Conclusions & Future works • Several sampled-based acoustic parameters, which could properly model the speech signal change, were proposed • Using the sample-based APs in phone boundary detector, better precision and accuracy were achieved • Segment-based speech attribution detectors 2011/7/12 NGASR研討會

  30. Segment-based Attribution detector • Segment based Attribution Recognizer Operation point : 3% MDR, 20% FAR Coding each contour using Legendre polynomial 2011/7/12 NGASR研討會

  31. Set the operation point to low MD, high FA rate. 80123 segments / 62465 phones. • Feature extraction using the Legendre coefficients of the AP contours 2011/7/12 NGASR研討會

  32. Pre-limitary result frame-based system using 9 frames feature. • Change into accuracy over time : 81.2% only 6 band-pass envelopes were used  phone alignment 2011/7/12 NGASR研討會

More Related