Pitch Estimation by Enhanced Super Resolution determinator

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling

Objective • Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)

Introduction • The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds. • The pitch of speech is the perceptual correlate of fundamental frequency . • The fundamental frequency of speech is important in the prosodic features of stress and intonation.

fundamental frequency determination Algorithm (FDAs). • Determine the fundamental frequency of speech waveform or analyzing the pitch automatically. • Desire to examine methods of fundamental frequency extraction which use radically different techniques

The algorithms to determine the • Cepstrum-based determinator (CFD) (Noll, 1969). • Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970) • Feature-based tracker (FBFT) (Phillips, 1985) • Parallel processing method (PP) (Gold & Rabiner, 1969) • Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983) • Super resolution determinator (SRFD) (Medan et al., 1991)

Enhance Super Resolution determinator (eSRFD) • based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient. • Performances of the SRFD algorithm, to reduced the occurrence of errors.

The eSRFD algorithm • Pass the speech waveform to low-pass filter . • The speech waveform is initially low-pass filtered.

Each frame of filtered sample data processed by the silence detector. • Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping. • Contains a set of samples • Divided 3 consecutive segment

Analysis segments for the enhanced super resolution determinator

Normalized cross-correlation for ‘voiced’ frame: • If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of

Definition threshold for candidate value • Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.

A second normalized cross-correlation coefficient . • The frame is classified as ‘voiced’ which has > • Determined the second normalized cross-correlation coefficient

Candidate score for • Candidates for exceeds the threshold are given a score of 2, others are 1. • If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates. • If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.

Otherwise, an optimal fundamental period is sought from the set of remaining candidates , calculated the coefficient of each candidate. • The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value , the subsequent is the optimal value.

In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame. • If the previous frame is ‘silent’, the current value is hold and depends on the next frame. • If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. • Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.

Modification apply biasing to and • Biasing is applied if the following conditions • The two previous frames were classified as ‘voiced’ • The value of the previous frame is not being temporarily held. • The of previous frame is less than 7/4 *( of its preceding voiced frame ) , and greater than 5/8* • The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.

Calculate the fundamental period: • The fundamental period for the frame is estimated by calculate

Implementation • In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm

The Result

Conclusion • The acoustic correlate of pitch is the fundamental frequency of speech. • Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. • It have occurrence error in the result which depend on kind of speech waveform. • In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.

References • [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh. • [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993.

Pitch Estimation by Enhanced Super Resolution determinator

Pitch Estimation by Enhanced Super Resolution determinator

Presentation Transcript

Robust Super-Resolution

Super-Resolution Fluorescence Microscopy

Single Image Super Resolution

Fast Direct Super-Resolution by Simple Functions

Noisy Video Super-Resolution

Exampled-based Super resolution

Super-Resolution With Fuzzy Motion Estimation

Super-Resolution

Super-Resolution

Super-Resolution Imaging

Super-resolution Image Reconstruction

Super-Resolution

Super-resolution Methods

Super-Resolution

Super-Resolution

Single-Frame Super Resolution

Pitch Estimation

Super-resolution Image Reconstruction

Super- Resolution - based inpainting

Super-Resolution