200 likes | 313 Views
Pitch Estimation. Speech Recognition Raymond Sastraputera. Outline. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch Correlation and Candidate Optimal Candidate Buffer Delay Added Bias Test and Result Conclusion. Introduction. Estimates the pitch on a speech
E N D
Pitch Estimation Speech Recognition Raymond Sastraputera
Outline • Introduction • Frame/Buffer • Algorithm • Silent Detector • Estimate Pitch • Correlation and Candidate • Optimal Candidate • Buffer Delay • Added Bias • Test and Result • Conclusion
Introduction • Estimates the pitch on a speech • Written in C++
Frame/Buffer • Frame segment are shifted with no overlap Frame segment Buffer
Silent Detector • Initial detection of silent • |max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| • Threshold Value (50dB) X Y Z
Estimate Pitch (Correlation) • Correlation of two vectors
Estimate Pitch (Correlation) • Correlation P(x,y) • Calculate for different window size (nm) • Window size will be the pitch value (in sample) • Correlation value above threshold become candidate with score 1 Vector x Vector y X Y Z nm nm
Estimate Pitch (Correlation) • Correlation P(y,z) • Calculate for different nm • Only for window size in candidate score 1 • Correlation value above threshold become candidate with score 2 Vector y Vector z X Y Z nm nm
Estimate Pitch (Correlation) • Correlation Q(n,m) • Calculate for different nm • nMAX is maximum nm in the candidate • Optimal Candidate • if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm Vector x Vector z X Y Z nMAX nm nMAX
Estimate Pitch (Candidate) • Candidate score 1 Correlation P(x,y) • No candidate silence • Single candidate compute P(y,z) • Score stays at 1 hold • Score 2 estimated pitch • Multi candidate compute P(y,z) • Candidate score 2 Correlation P(y,z) • No candidate compute Q(n,m) candidate score1 • Single candidate estimated pitch • Multi candidate compute Q(n,m) • Optimal Pitch Correlation Q(n,m)
Estimate Pitch (Optimal Candidate) • Single candidate with score 2 • From Q(n,m) of • Candidate score 2 • Candidate score 1 • On hold, and next frame estimated pitch is neither silence nor on hold.
Buffer Delay • Delay the returning value of estimated pitch • Needed to limit the duration of on hold
Bias • Conditions: • Two previous frame is not silent • Previous frame is not on hold • Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch
Bias • P(x,y) is doubled
Test Parameter • correlation_threshold_silent(0.88) • Qnm_optimal_multiplier(0.77) • sample_rate(20000.0F) • max_pitch(400) • min_pitch(50) • pitch_buffer_size(20) • bias_max_frequency(7/4) • bias_min_frequency(5/8) • silent_threshold(50.0F)
Conclusion • Some improvement can be done to increase the performance of the estimated pitch. • Reduce the search space • Adding 1st order derivaiton of the pitch • Filtering the outlier / noise • Current algorithm might not be fast enough to perform in real time
Reference • Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).