320 likes | 425 Views
Phone Boundary Detection using Sample-based Acoustic Parameters. Yih-Ru Wang Institute of Communication Engineering, National Chiao Tung University, Hsinchu, Taiwan, ROC. Outline. Motivation, Background Why sample-based? Sample-based Acoustic Parameters & Phone Boundary Detector
E N D
Phone Boundary Detection using Sample-based Acoustic Parameters Yih-Ru Wang Institute ofCommunication Engineering, National Chiao Tung University, Hsinchu, Taiwan, ROC 2011/7/12 NGASR研討會
Outline • Motivation, Background • Why sample-based? • Sample-based Acoustic Parameters & Phone Boundary Detector • Experimental results • Conclusions and Future works 2011/7/12 NGASR研討會
Motivation • Find the synchronous “clock” for Detection-based ASR, Computer Aided Language Learning(CALL) System Segment-based system Speech signal Speech Attribution Detectors Synchronous “clock” for the system Detection-based ASR, CALL system Phone Boundary Detector 2011/7/12 NGASR研討會
Background • Tasks of Phonetic Segmentation • Phone alignment, 87% inclusion rate for 10 msec tolerance for experts • Phone boundary detection • Phone alignment : using Model-based method • HMM, MBE-HMM (Minimum Boundary Error HMM), HMM + fine tuning using SVM, … • Phone boundary detection : using Metric-based method • ameasure of speech signal change • norm of delta MFCC feature vector (Rabiner, 2006) • KL distance or BIC of speech signal • The frame-based features, like MFCC, were used 2011/7/12 NGASR研討會
Why sample-based? • Transient vs. Stationary • Accuracy and precision • especially for ‘short’ phones, e.g. plosives • Acoustic feature used high frequency resolution, like MFCC to ‘recognize’ phones in speech • To detect the pronunciation manner/position (acoustics) changes in speech signal • increase time resolution and decrease frequency resolution of the features 2011/7/12 NGASR研討會
To find the useful measures of speech signal change in sample-based system • Sample-based Acoustic Parameters were proposed • PROs of sample-based method • Better accuracy and precision • Properly detect the boundary of short phones • CONs of sample-based method • Complexity of system? • Higher false alarm? 2011/7/12 NGASR研討會
Sample-based Acoustic Parameters & Phone boundary detector • Sub-band signal envelope • Six sub-bands used for landmark detection (Liu, 1996) • ROR (rate of raising) of Sub-band signal envelope • The delta-term of a feature Bandpass freq. 5.0 – 8.0 k Hz 3.5 – 5.0 k Hz 2.0 – 3.5 k Hz 1.5 – 2.0 k Hz 0.8 – 1.2 k Hz 0.0 – 0.4 k Hz 2011/7/12 NGASR研討會
|Stop |Glide|Vowel |Nasal |Vowel |Fricative |Fricative |Vowel |Nasal |Vowel |Silence Waveform Envelope Sub-band signal envelope 5.0 – 8.0 k Hz TIMIT: FDRW0/sx293 Please take this dirty table cloth to the cleaners for me 0.0 – 0.4 k Hz 2011/7/12 NGASR研討會
~20ms Please take this dirty cloth… RORof signal envelope RORof Sub-band signal envelope 2011/7/12 NGASR研討會
Norm of sub-band signal envelopes can be a useful measure of signal change • Sample-based spectral entropy can be defined as where is the i-th normalized sub-band signal envelope • Sample-based spectral KL distance between speech signals at two adjacent times [n, n +1] can be defined as 2011/7/12 NGASR研討會
An example of sample-based spectral entropy and its ROR Sample-based Spectral entropy ROR of Spectral entropy 2011/7/12 NGASR研討會
An Example of sample-based spectral KL distance Sample-based spectral KL distance It can be used to find the signal change points moreaccurately and precisely. 2011/7/12 NGASR研討會
A MLP was used as the Phone Boundary detector • The block diagram of proposed training/test procedure 2011/7/12 NGASR研討會
Candidates Pre-selection – find all the speech samples, with index n, which satisfied • Pre-selection can be used to reduce the complexity and FAof sample-based system. • After candidate pre-selection, a MLP was used as the boundary detector 2011/7/12 NGASR研討會
The AP features used for MLP detector 2011/7/12 NGASR研討會
Iterative training procedure 2011/7/12 NGASR研討會
2nd stage : • use similarity measure of segmental acoustic signals • Using GMM to model the pdf of a speech segment • The KL1 distance of CCGMM (Wang, 2004) Using a common GMM to represent the pdfs of two segments 2011/7/12 NGASR研討會
Similarity measure of two speech segments: • Discrete KL-1 distance of CCGMM coefficient • Discrete KL-2 distance using CCGMM coefficient 2011/7/12 NGASR研討會
Discrete KL-1 distance is the mean of log-likelihood of two pdfs • The similarity of two pdfs • Find high order statistics of log-likelihood pdfs (Wang, 2008) • Variance, skewness of log-likelihood pdfs 2011/7/12 NGASR研討會
use segmental similarity 2011/7/12 NGASR研討會
Experimental Results • Database : TIMIT. • After candidates pre-selection, • 1 over 116 samples was selected • 0.9% MD due to candidate pre-selection • Performance of MLP boundary detector: 2011/7/12 NGASR研討會
Performance of the sample-based boundary detector 2011/7/12 NGASR研討會
An example of proposed phone boundary detector 2011/7/12 NGASR研討會
Accuracy of the sample-based boundary detector 2011/7/12 NGASR研討會
Compare to Dr. Rabiner’s work [2006] : Dr. Rabiner’s result : (22.8%59.2%) 2011/7/12 NGASR研討會
Error analysis – MAE of detected boundary • Overall : 7.6/12.4 • Sample-based/HMM system (unit ms) • * no. of sample less than 100 2011/7/12 NGASR研討會
Accuracy of proposed method – 2011/7/12 NGASR研討會
Error analysis (1 stage) – MDR and FAR 2011/7/12 NGASR研討會
Conclusions & Future works • Several sampled-based acoustic parameters, which could properly model the speech signal change, were proposed • Using the sample-based APs in phone boundary detector, better precision and accuracy were achieved • Segment-based speech attribution detectors 2011/7/12 NGASR研討會
Segment-based Attribution detector • Segment based Attribution Recognizer Operation point : 3% MDR, 20% FAR Coding each contour using Legendre polynomial 2011/7/12 NGASR研討會
Set the operation point to low MD, high FA rate. 80123 segments / 62465 phones. • Feature extraction using the Legendre coefficients of the AP contours 2011/7/12 NGASR研討會
Pre-limitary result frame-based system using 9 frames feature. • Change into accuracy over time : 81.2% only 6 band-pass envelopes were used phone alignment 2011/7/12 NGASR研討會