480 likes | 1.13k Views
Guest Lecture for ECE492 Computer Audition Single Pitch Detection. Na Yang, He Ba nayang@rochester.edu 9/24/2013. Outline. What is single pitch detection? Why is pitch detection important? Pitch detection for speech Time domain: autocorrelation, YIN
E N D
Guest Lecture for ECE492 Computer AuditionSingle Pitch Detection Na Yang, He Ba nayang@rochester.edu 9/24/2013
Outline • What is single pitch detection? • Why is pitch detection important? • Pitch detection for speech • Time domain: autocorrelation, YIN • Frequency domain: harmonic summation method • Cepstrum domain: Cepstrum • The YIN algorithm • Pitch detection for music • Pitch detection in noisy environment
Generation of voiced speech Voice Vocal tract (resonator) Vibrating vocal cords (oscillator) Lungs (power supply)
Definition of pitch • Pitch • The relative highness or lowness of a tone as perceived by the ear • Depends on the number of vibrations per second produced by the vocal cords • Fundamental frequency (F0) is an objective estimation of pitch • Quick facts • Human speech: from 40 Hz for low-pitched male to 600 Hz for children or high-pitched female • Piano: 27 Hz – 4,186 Hz • Human hearing range: 20 Hz – 20,000 Hz
A test to your ears! A piece of piano A speech utterance
Spectrogram of speech The pitch contour (But the pitch contour does not always have the strongest energy among all harmonics) Spectrogram of a speech utterance
Example of pitch detection frame 25 frame 65
Example of pitch detection MATLAB animation • The spectrum of frame 65 Clear harmonics in voiced frames. Pitch = 196 Hz
Example of pitch detection MATLAB animation • The spectrum of frame 25 No clear harmonics in unvoiced frame (fricatives, noise, etc.)
Why pitch detection is important? • Applications • Speech recognition: homophones with different tones • Emotion recognition: prosodic variations • Automatic music transcription • Sound transformations: pitch-shifting in sound-editing programs (e.g.: Talking Tom cat) • Challenges • For imperfectly periodic human speech • Pitch detection in noisy environments (mobile applications)
Pitch detection for speech Speech utterance • How to choose the frame length? • Too large • Cannot capture the pitch variation (low temporal resolution) • Too short • Cannot obtain a reliable pitch detection Pitch detection on each frame
Pitch detection for speech • Usually choose the frame length to be 2-3 pitch periods 3 pitch periods Minimum speech pitch is 50 Hz Frame length = 1/50x3=0.06s
Pitch detection algorithms • Algorithms in different domains use different properties of the speech signal • Frequency domain: harmonics are at integer multiples of pitch • Cepstrum domain: harmonics are regularly spaced, i.e., spectrum is periodic • Time domain: time domain signal is periodic
Pitch detection in frequency domain • Routine • Break the signal into small frames • Multiply by a window • Compute short time Fourier transform (STFT) of the frame • Peaks in multiples of pitch • E.g.: Harmonic summation method • Harmonic Product Spectrum (HPS) [Schroeder1968, Noll 1969] Harmonics at integer multiples of pitch [Cuadra 2001]
Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] • Definition • The inverse Fourier analysis of the logarithmic amplitude spectrum of the signal. • Spectrum cepstrum frequency (Hz) quefrency (s) • Concept • The log amplitude spectrum contains regularly spaced harmonic, thus can be viewed as a periodic signal and the period is pitch.
Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] Voiced/unvoiced classification? Ratio of the amplitudes of the two highest cepstrum peaks is smaller than a threshold. [Rabiner 1976]
Pitch detection in time domain • Autocorrelation (ACF) • Basis: the time-domain signal is periodic • A periodic signal correlates strongly with itself when offset by the fundamental period • Autocorrelation shows peaks at multiples of pitch period • Problem • Sensitive to peak amplitude changes -> choose a higher-order peak (octave errors) (Not only use the current frame)
The YIN pitch detection algorithm • Step 1: Autocorrelation • Step 2: Difference function • Immune to amplitude changes • Example: increase in signal amplitude • Problem • A strong resonance at the first formant F1 might produce a series of secondary dips, one of which might be deeper than the period dip -> choose a lower-order peak Dips: ‘yin’, as opposed to ‘yang’
The YIN pitch detection algorithm • Step 3: Cumulative mean • Normalized difference function • Divide each value of the old by its average over shorter-lag values • Problem • May still choose higher-order peaks
The YIN pitch detection algorithm • Step 4: Absolute threshold • To prevent from choosing higher-order peaks • How? • Select dips deeper than a threshold • Choose the dip with the smallest • The absolute threshold used in YIN is set to be 0.1 0.1 Dip with the smallest
The YIN pitch detection algorithm • Step 5: Parabolic interpolation • Each local minimum of d’() and its immediate neighbors are fit by a parabola, and the ordinate of the interpolated minimum is used in the dip-selection process. • Step 6: Best local estimate • Find the best pitch estimation (with the smallest d’) around the vicinity of the analysis point.
The YIN pitch detection algorithm • Parameter sensitivity • Integration window length: 1/40 Hz=25 ms • Threshold in Step 4: 0.1 • Cutoff frequency of the initial low-pass filtering of the signal: 1 kHz
Evaluation of a pitch detection algorithm • Choose speech samples • Ground truth • Manually inspect the spectrum of each frame • Simultaneously recorded laryngograph signals • Detected pitch from other popular pitch detection algorithms • Error measurement metrics • Gross Pitch Error (GPE) rate • For speech, if a detected pitch deviates from the ground truth value by 10% (or 20%), it is considered as an error. • Fine Pitch Error (mean square errors)
Pitch detection for music • Auditory attributes of musical tones: pitch, duration, loudness, and timbre. • Applications • Music notation programs automatically transcribe real performances into scores • Query-by-humming music retrieval • Challenges • Pitch generated from tonal musical instruments spans a large range, normally 50-4,000 Hz • Overlapped harmonics of musical tones • Diverse timbre from different instruments • Noise introduced by recording device or noisy environment
Pitch detection for music • The Mel-scale • Aperceptual scale of pitches judged by listeners to be equal in distance from one another. • The frequency for note n is: The frequency for A♯4? Error measurement for pitch detection for music: half of a semitone, i.e., 3%.
Pitch detection in noisy environment • Additive noise • Amplitudes can be very high. • Peak frequency are not periodic. • Solution • Frequency of the spectral peaks are less affected than amplitudes. • Use the ratios of harmonic frequencies. Spectrum of one frame of clean speech and speech with babble noise at 0 dB SNR
The BaNa pitch detection algorithm • Step1: Search 5 peaks with the lowest frequency • Step 2:
The BaNa pitch detection algorithm • Step 3 (post-processing): use the Viterbi algorithm to find the pitch candidates that minimize the cost function: Download our BaNa app for Android! http://www.ece.rochester.edu/projects/wcng/project_bridge.html
The BaNa pitch detection algorithm Speech with babble noise at: 20 dB 10 dB 0 dB GPE rate for the LDC database for speech with babble noise. Detected pitch deviating more than 10% from ground truth are errors.
References • Schroeder, M. R., “Period histogram and product spectrum: New methods for fundamental frequency measurement,” 1968. • Noll, A. M., “Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate,” 1969. • Cuadra, P. De La and Master, A., “Efficient pitch detection techniques for interactive music,” 2001. • Childers, D. G., Skinner, D.P., and Kemerait, R.C., “The cepstrum: A guide to processing,” 1977. • Rabiner, L. R., Cheng, M. J., Osenberg, A. E., and McGonegal, C. A., “A comparative performance study of several pitch detection algorithms,” 1976. • Cheveigne, A. de, and Kawahara, H., “YIN, a fundamental frequency estimator for speech and music,” 2002. • Ba, H., Yang, N., Demirkol, I., and Heinzelman, W., “BaNa: A hybrid approach for noise resilient pitch detection,” 2012.
Thanks! Q & A