250 likes | 457 Views
Pitch Estimation by Enhanced Super Resolution determinator. By Sunya Santananchai Chia-Ho Ling. Objective. Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD). Introduction.
E N D
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling
Objective • Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)
Introduction • The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds. • The pitch of speech is the perceptual correlate of fundamental frequency . • The fundamental frequency of speech is important in the prosodic features of stress and intonation.
fundamental frequency determination Algorithm (FDAs). • Determine the fundamental frequency of speech waveform or analyzing the pitch automatically. • Desire to examine methods of fundamental frequency extraction which use radically different techniques
The algorithms to determine the • Cepstrum-based determinator (CFD) (Noll, 1969). • Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970) • Feature-based tracker (FBFT) (Phillips, 1985) • Parallel processing method (PP) (Gold & Rabiner, 1969) • Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983) • Super resolution determinator (SRFD) (Medan et al., 1991)
Enhance Super Resolution determinator (eSRFD) • based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient. • Performances of the SRFD algorithm, to reduced the occurrence of errors.
The eSRFD algorithm • Pass the speech waveform to low-pass filter . • The speech waveform is initially low-pass filtered.
Each frame of filtered sample data processed by the silence detector. • Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping. • Contains a set of samples • Divided 3 consecutive segment
Analysis segments for the enhanced super resolution determinator
Normalized cross-correlation for ‘voiced’ frame: • If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of
Definition threshold for candidate value • Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.
A second normalized cross-correlation coefficient . • The frame is classified as ‘voiced’ which has > • Determined the second normalized cross-correlation coefficient
Candidate score for • Candidates for exceeds the threshold are given a score of 2, others are 1. • If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates. • If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.
Otherwise, an optimal fundamental period is sought from the set of remaining candidates , calculated the coefficient of each candidate. • The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value , the subsequent is the optimal value.
In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame. • If the previous frame is ‘silent’, the current value is hold and depends on the next frame. • If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. • Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.
Modification apply biasing to and • Biasing is applied if the following conditions • The two previous frames were classified as ‘voiced’ • The value of the previous frame is not being temporarily held. • The of previous frame is less than 7/4 *( of its preceding voiced frame ) , and greater than 5/8* • The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.
Calculate the fundamental period: • The fundamental period for the frame is estimated by calculate
Implementation • In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm
Conclusion • The acoustic correlate of pitch is the fundamental frequency of speech. • Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. • It have occurrence error in the result which depend on kind of speech waveform. • In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.
References • [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh. • [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993.