1 / 32

Guest Lecture for ECE492 Computer Audition Single Pitch Detection

Guest Lecture for ECE492 Computer Audition Single Pitch Detection. Na Yang, He Ba nayang@rochester.edu 9/24/2013. Outline. What is single pitch detection? Why is pitch detection important? Pitch detection for speech Time domain: autocorrelation, YIN

kesler
Download Presentation

Guest Lecture for ECE492 Computer Audition Single Pitch Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Guest Lecture for ECE492 Computer AuditionSingle Pitch Detection Na Yang, He Ba nayang@rochester.edu 9/24/2013

  2. Outline • What is single pitch detection? • Why is pitch detection important? • Pitch detection for speech • Time domain: autocorrelation, YIN • Frequency domain: harmonic summation method • Cepstrum domain: Cepstrum • The YIN algorithm • Pitch detection for music • Pitch detection in noisy environment

  3. Generation of voiced speech Voice Vocal tract (resonator) Vibrating vocal cords (oscillator) Lungs (power supply)

  4. Definition of pitch • Pitch • The relative highness or lowness of a tone as perceived by the ear • Depends on the number of vibrations per second produced by the vocal cords • Fundamental frequency (F0) is an objective estimation of pitch • Quick facts • Human speech: from 40 Hz for low-pitched male to 600 Hz for children or high-pitched female • Piano: 27 Hz – 4,186 Hz • Human hearing range: 20 Hz – 20,000 Hz

  5. A test to your ears! A piece of piano A speech utterance

  6. Spectrogram of speech The pitch contour (But the pitch contour does not always have the strongest energy among all harmonics) Spectrogram of a speech utterance

  7. Example of pitch detection frame 25 frame 65

  8. Example of pitch detection MATLAB animation • The spectrum of frame 65 Clear harmonics in voiced frames. Pitch = 196 Hz

  9. Example of pitch detection MATLAB animation • The spectrum of frame 25 No clear harmonics in unvoiced frame (fricatives, noise, etc.)

  10. Why pitch detection is important? • Applications • Speech recognition: homophones with different tones • Emotion recognition: prosodic variations • Automatic music transcription • Sound transformations: pitch-shifting in sound-editing programs (e.g.: Talking Tom cat) • Challenges • For imperfectly periodic human speech • Pitch detection in noisy environments (mobile applications)

  11. Pitch detection for speech Speech utterance • How to choose the frame length? • Too large • Cannot capture the pitch variation (low temporal resolution) • Too short • Cannot obtain a reliable pitch detection Pitch detection on each frame

  12. Pitch detection for speech • Usually choose the frame length to be 2-3 pitch periods 3 pitch periods Minimum speech pitch is 50 Hz Frame length = 1/50x3=0.06s

  13. Pitch detection algorithms • Algorithms in different domains use different properties of the speech signal • Frequency domain: harmonics are at integer multiples of pitch • Cepstrum domain: harmonics are regularly spaced, i.e., spectrum is periodic • Time domain: time domain signal is periodic

  14. Pitch detection in frequency domain • Routine • Break the signal into small frames • Multiply by a window • Compute short time Fourier transform (STFT) of the frame • Peaks in multiples of pitch • E.g.: Harmonic summation method • Harmonic Product Spectrum (HPS) [Schroeder1968, Noll 1969] Harmonics at integer multiples of pitch [Cuadra 2001]

  15. Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] • Definition • The inverse Fourier analysis of the logarithmic amplitude spectrum of the signal.  • Spectrum cepstrum frequency (Hz) quefrency (s) • Concept • The log amplitude spectrum contains regularly spaced harmonic, thus can be viewed as a periodic signal and the period is pitch.

  16. Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] Voiced/unvoiced classification? Ratio of the amplitudes of the two highest cepstrum peaks is smaller than a threshold. [Rabiner 1976]

  17. Pitch detection in time domain • Autocorrelation (ACF) • Basis: the time-domain signal is periodic • A periodic signal correlates strongly with itself when offset by the fundamental period • Autocorrelation shows peaks at multiples of pitch period • Problem • Sensitive to peak amplitude changes -> choose a higher-order peak (octave errors) (Not only use the current frame)

  18. The YIN pitch detection algorithm • Step 1: Autocorrelation • Step 2: Difference function • Immune to amplitude changes • Example: increase in signal amplitude • Problem • A strong resonance at the first formant F1 might produce a series of secondary dips, one of which might be deeper than the period dip -> choose a lower-order peak Dips: ‘yin’, as opposed to ‘yang’

  19. The YIN pitch detection algorithm • Step 3: Cumulative mean • Normalized difference function • Divide each value of the old by its average over shorter-lag values • Problem • May still choose higher-order peaks

  20. The YIN pitch detection algorithm • Step 4: Absolute threshold • To prevent from choosing higher-order peaks • How? • Select dips deeper than a threshold • Choose the dip with the smallest • The absolute threshold used in YIN is set to be 0.1 0.1 Dip with the smallest

  21. The YIN pitch detection algorithm • Step 5: Parabolic interpolation • Each local minimum of d’() and its immediate neighbors are fit by a parabola, and the ordinate of the interpolated minimum is used in the dip-selection process. • Step 6: Best local estimate • Find the best pitch estimation (with the smallest d’) around the vicinity of the analysis point.

  22. The YIN pitch detection algorithm • Parameter sensitivity • Integration window length: 1/40 Hz=25 ms • Threshold in Step 4: 0.1 • Cutoff frequency of the initial low-pass filtering of the signal: 1 kHz

  23. Evaluation of a pitch detection algorithm • Choose speech samples • Ground truth • Manually inspect the spectrum of each frame • Simultaneously recorded laryngograph signals • Detected pitch from other popular pitch detection algorithms • Error measurement metrics • Gross Pitch Error (GPE) rate • For speech, if a detected pitch deviates from the ground truth value by 10% (or 20%), it is considered as an error. • Fine Pitch Error (mean square errors)

  24. Pitch detection for music • Auditory attributes of musical tones: pitch, duration, loudness, and timbre. • Applications • Music notation programs automatically transcribe real performances into scores • Query-by-humming music retrieval • Challenges • Pitch generated from tonal musical instruments spans a large range, normally 50-4,000 Hz • Overlapped harmonics of musical tones • Diverse timbre from different instruments • Noise introduced by recording device or noisy environment

  25. Pitch detection for music • The Mel-scale • Aperceptual scale of pitches judged by listeners to be equal in distance from one another. • The frequency for note n is: The frequency for A♯4? Error measurement for pitch detection for music: half of a semitone, i.e., 3%.

  26. Pitch detection in noisy environment • Additive noise • Amplitudes can be very high. • Peak frequency are not periodic. • Solution • Frequency of the spectral peaks are less affected than amplitudes. • Use the ratios of harmonic frequencies. Spectrum of one frame of clean speech and speech with babble noise at 0 dB SNR

  27. The BaNa pitch detection algorithm • Step1: Search 5 peaks with the lowest frequency • Step 2:

  28. The BaNa pitch detection algorithm • Step 3 (post-processing): use the Viterbi algorithm to find the pitch candidates that minimize the cost function: Download our BaNa app for Android! http://www.ece.rochester.edu/projects/wcng/project_bridge.html

  29. BaNa pitch detection app

  30. The BaNa pitch detection algorithm Speech with babble noise at: 20 dB 10 dB 0 dB GPE rate for the LDC database for speech with babble noise. Detected pitch deviating more than 10% from ground truth are errors.

  31. References • Schroeder, M. R., “Period histogram and product spectrum: New methods for fundamental frequency measurement,” 1968. • Noll, A. M., “Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate,” 1969. • Cuadra, P. De La and Master, A., “Efficient pitch detection techniques for interactive music,” 2001. • Childers, D. G., Skinner, D.P., and Kemerait, R.C., “The cepstrum: A guide to processing,” 1977. • Rabiner, L. R., Cheng, M. J., Osenberg, A. E., and McGonegal, C. A., “A comparative performance study of several pitch detection algorithms,” 1976. • Cheveigne, A. de, and Kawahara, H., “YIN, a fundamental frequency estimator for speech and music,” 2002. • Ba, H., Yang, N., Demirkol, I., and Heinzelman, W., “BaNa: A hybrid approach for noise resilient pitch detection,” 2012.

  32. Thanks! Q & A

More Related