180 likes | 290 Views
A Spectral-Temporal Method for Pitch Tracking. Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006. Outline. Introduction Algorithm
E N D
A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006
Outline • Introduction • Algorithm • Algorithm overview • The use of nonlinear processing • Pitch tracking from the spectrum • Experimental evaluation • Conclusion
Introduction • Pitch(the fundamental frequency) applications • Automatic speech recognition (ASR), speech synthesis, speech articulation training aids, etc. • Pitch detection algorithms • “Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc => High accuracy for noisy speech reported using the harmonic dominance spectrum • “Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc => Hybrid spectral-temporal processing for pitch tracking
1st harmonic 2nd harmonic Fundamental The fundamental reappears The Use of Nonlinear Processing • Restoration of missing fundamental in telephone speech • A periodic sound is characterized by the spectrum of its harmonics • The signal the fundamental missed be approximated as • After squaring and applying trigonometric identities
Illustration of Nonlinear Processing • The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame
Illustration of Nonlinear Processing • The magnitude spectrum for the telephone (top panel) and nonlinear processed signal (bottom panel)
Spectral Effects from Nonlinear Processing • The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)
Pitch Tracking From the Spectrum • The pitch track from the spectrum refines the pitch candidates estimated from the temporal method • To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed
k 4k 2k 3k WL X X X : Frequency index, : The spectrum, : The number of harmonics (3), : Window length (20Hz) Autocorrelation type of Function • The function takes into account multiple harmonics • Equation
A very prominent peak is observed in the proposed function Peaks in Autocorrelation Type of Function
P2(Hz)=P1(Hz)/2 Candidate Insertion to Reduce Pitch Doubling/Halving • If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate • Similar logic is used to reduce pitch halving
Experimental Evaluation • Database • Keele pitch extraction database • 5 male and 5 female speakers, about 35seconds speaker • High quality speech and telephone speech • Additive Gaussian noise • Controls (reference pitch) • Control C1: supplied in Keele database • Control C2: computed from the laryngograph signal with the proposed algorithm
Definition of Error Measures • Gross error • The percentage of frames such that the pitch estimate of the tracker deviates significantly (typically 20%) from the reference pitch (control) • Only evaluated in the voiced sections of the reference
Experiment 1 Results • Individual performance of the proposed algorithm YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT
The results of the new method with various error thresholds Experiment 2 Results
Comparisons • DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc. • *: SRAEN filter simulated telephone speech
Conclusion • A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking • An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking • Acknowledgements • This work was partially supported by JWFC 900