1 / 24

Pitch Estimation by Enhanced Super Resolution determinator

Pitch Estimation by Enhanced Super Resolution determinator. By Sunya Santananchai Chia-Ho Ling. Objective. Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD). Introduction.

tasya
Download Presentation

Pitch Estimation by Enhanced Super Resolution determinator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling

  2. Objective • Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)

  3. Introduction • The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds. • The pitch of speech is the perceptual correlate of fundamental frequency . • The fundamental frequency of speech is important in the prosodic features of stress and intonation.

  4. fundamental frequency determination Algorithm (FDAs). • Determine the fundamental frequency of speech waveform or analyzing the pitch automatically. • Desire to examine methods of fundamental frequency extraction which use radically different techniques

  5. The algorithms to determine the • Cepstrum-based determinator (CFD) (Noll, 1969). • Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970) • Feature-based tracker (FBFT) (Phillips, 1985) • Parallel processing method (PP) (Gold & Rabiner, 1969) • Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983) • Super resolution determinator (SRFD) (Medan et al., 1991)

  6. Enhance Super Resolution determinator (eSRFD) • based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient. • Performances of the SRFD algorithm, to reduced the occurrence of errors.

  7. The eSRFD algorithm • Pass the speech waveform to low-pass filter . • The speech waveform is initially low-pass filtered.

  8. Each frame of filtered sample data processed by the silence detector. • Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping. • Contains a set of samples • Divided 3 consecutive segment

  9. Analysis segments for the enhanced super resolution determinator

  10. Normalized cross-correlation for ‘voiced’ frame: • If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of

  11. Definition threshold for candidate value • Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.

  12. A second normalized cross-correlation coefficient . • The frame is classified as ‘voiced’ which has > • Determined the second normalized cross-correlation coefficient

  13. Candidate score for • Candidates for exceeds the threshold are given a score of 2, others are 1. • If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates. • If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.

  14. Otherwise, an optimal fundamental period is sought from the set of remaining candidates , calculated the coefficient of each candidate. • The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value , the subsequent is the optimal value.

  15. In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame. • If the previous frame is ‘silent’, the current value is hold and depends on the next frame. • If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. • Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.

  16. Modification apply biasing to and • Biasing is applied if the following conditions • The two previous frames were classified as ‘voiced’ • The value of the previous frame is not being temporarily held. • The of previous frame is less than 7/4 *( of its preceding voiced frame ) , and greater than 5/8* • The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.

  17. Calculate the fundamental period: • The fundamental period for the frame is estimated by calculate

  18. Implementation • In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm

  19. The Result

  20. The Result

  21. Conclusion • The acoustic correlate of pitch is the fundamental frequency of speech. • Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. • It have occurrence error in the result which depend on kind of speech waveform. • In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.

  22. References • [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh. • [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993.

More Related