Guest Lecture for ECE492 Computer Audition Single Pitch Detection

Guest Lecture for ECE492 Computer AuditionSingle Pitch Detection Na Yang, He Ba nayang@rochester.edu 9/24/2013

Outline • What is single pitch detection? • Why is pitch detection important? • Pitch detection for speech • Time domain: autocorrelation, YIN • Frequency domain: harmonic summation method • Cepstrum domain: Cepstrum • The YIN algorithm • Pitch detection for music • Pitch detection in noisy environment

Generation of voiced speech Voice Vocal tract (resonator) Vibrating vocal cords (oscillator) Lungs (power supply)

Definition of pitch • Pitch • The relative highness or lowness of a tone as perceived by the ear • Depends on the number of vibrations per second produced by the vocal cords • Fundamental frequency (F0) is an objective estimation of pitch • Quick facts • Human speech: from 40 Hz for low-pitched male to 600 Hz for children or high-pitched female • Piano: 27 Hz – 4,186 Hz • Human hearing range: 20 Hz – 20,000 Hz

A test to your ears! A piece of piano A speech utterance

Spectrogram of speech The pitch contour (But the pitch contour does not always have the strongest energy among all harmonics) Spectrogram of a speech utterance

Example of pitch detection frame 25 frame 65

Example of pitch detection MATLAB animation • The spectrum of frame 65 Clear harmonics in voiced frames. Pitch = 196 Hz

Example of pitch detection MATLAB animation • The spectrum of frame 25 No clear harmonics in unvoiced frame (fricatives, noise, etc.)

Why pitch detection is important? • Applications • Speech recognition: homophones with different tones • Emotion recognition: prosodic variations • Automatic music transcription • Sound transformations: pitch-shifting in sound-editing programs (e.g.: Talking Tom cat) • Challenges • For imperfectly periodic human speech • Pitch detection in noisy environments (mobile applications)

Pitch detection for speech Speech utterance • How to choose the frame length? • Too large • Cannot capture the pitch variation (low temporal resolution) • Too short • Cannot obtain a reliable pitch detection Pitch detection on each frame

Pitch detection for speech • Usually choose the frame length to be 2-3 pitch periods 3 pitch periods Minimum speech pitch is 50 Hz Frame length = 1/50x3=0.06s

Pitch detection algorithms • Algorithms in different domains use different properties of the speech signal • Frequency domain: harmonics are at integer multiples of pitch • Cepstrum domain: harmonics are regularly spaced, i.e., spectrum is periodic • Time domain: time domain signal is periodic

Pitch detection in frequency domain • Routine • Break the signal into small frames • Multiply by a window • Compute short time Fourier transform (STFT) of the frame • Peaks in multiples of pitch • E.g.: Harmonic summation method • Harmonic Product Spectrum (HPS) [Schroeder1968, Noll 1969] Harmonics at integer multiples of pitch [Cuadra 2001]

Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] • Definition • The inverse Fourier analysis of the logarithmic amplitude spectrum of the signal. • Spectrum cepstrum frequency (Hz) quefrency (s) • Concept • The log amplitude spectrum contains regularly spaced harmonic, thus can be viewed as a periodic signal and the period is pitch.

Pitch detection in Cepstrum domain • Cepstrum [Childers 1977] Voiced/unvoiced classification? Ratio of the amplitudes of the two highest cepstrum peaks is smaller than a threshold. [Rabiner 1976]

Pitch detection in time domain • Autocorrelation (ACF) • Basis: the time-domain signal is periodic • A periodic signal correlates strongly with itself when offset by the fundamental period • Autocorrelation shows peaks at multiples of pitch period • Problem • Sensitive to peak amplitude changes -> choose a higher-order peak (octave errors) (Not only use the current frame)

The YIN pitch detection algorithm • Step 1: Autocorrelation • Step 2: Difference function • Immune to amplitude changes • Example: increase in signal amplitude • Problem • A strong resonance at the first formant F1 might produce a series of secondary dips, one of which might be deeper than the period dip -> choose a lower-order peak Dips: ‘yin’, as opposed to ‘yang’

The YIN pitch detection algorithm • Step 3: Cumulative mean • Normalized difference function • Divide each value of the old by its average over shorter-lag values • Problem • May still choose higher-order peaks

The YIN pitch detection algorithm • Step 4: Absolute threshold • To prevent from choosing higher-order peaks • How? • Select dips deeper than a threshold • Choose the dip with the smallest • The absolute threshold used in YIN is set to be 0.1 0.1 Dip with the smallest

The YIN pitch detection algorithm • Step 5: Parabolic interpolation • Each local minimum of d’() and its immediate neighbors are fit by a parabola, and the ordinate of the interpolated minimum is used in the dip-selection process. • Step 6: Best local estimate • Find the best pitch estimation (with the smallest d’) around the vicinity of the analysis point.

The YIN pitch detection algorithm • Parameter sensitivity • Integration window length: 1/40 Hz=25 ms • Threshold in Step 4: 0.1 • Cutoff frequency of the initial low-pass filtering of the signal: 1 kHz

Evaluation of a pitch detection algorithm • Choose speech samples • Ground truth • Manually inspect the spectrum of each frame • Simultaneously recorded laryngograph signals • Detected pitch from other popular pitch detection algorithms • Error measurement metrics • Gross Pitch Error (GPE) rate • For speech, if a detected pitch deviates from the ground truth value by 10% (or 20%), it is considered as an error. • Fine Pitch Error (mean square errors)

Pitch detection for music • Auditory attributes of musical tones: pitch, duration, loudness, and timbre. • Applications • Music notation programs automatically transcribe real performances into scores • Query-by-humming music retrieval • Challenges • Pitch generated from tonal musical instruments spans a large range, normally 50-4,000 Hz • Overlapped harmonics of musical tones • Diverse timbre from different instruments • Noise introduced by recording device or noisy environment

Pitch detection for music • The Mel-scale • Aperceptual scale of pitches judged by listeners to be equal in distance from one another. • The frequency for note n is: The frequency for A♯4? Error measurement for pitch detection for music: half of a semitone, i.e., 3%.

Pitch detection in noisy environment • Additive noise • Amplitudes can be very high. • Peak frequency are not periodic. • Solution • Frequency of the spectral peaks are less affected than amplitudes. • Use the ratios of harmonic frequencies. Spectrum of one frame of clean speech and speech with babble noise at 0 dB SNR

The BaNa pitch detection algorithm • Step1: Search 5 peaks with the lowest frequency • Step 2:

The BaNa pitch detection algorithm • Step 3 (post-processing): use the Viterbi algorithm to find the pitch candidates that minimize the cost function: Download our BaNa app for Android! http://www.ece.rochester.edu/projects/wcng/project_bridge.html

BaNa pitch detection app

The BaNa pitch detection algorithm Speech with babble noise at: 20 dB 10 dB 0 dB GPE rate for the LDC database for speech with babble noise. Detected pitch deviating more than 10% from ground truth are errors.

References • Schroeder, M. R., “Period histogram and product spectrum: New methods for fundamental frequency measurement,” 1968. • Noll, A. M., “Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate,” 1969. • Cuadra, P. De La and Master, A., “Efficient pitch detection techniques for interactive music,” 2001. • Childers, D. G., Skinner, D.P., and Kemerait, R.C., “The cepstrum: A guide to processing,” 1977. • Rabiner, L. R., Cheng, M. J., Osenberg, A. E., and McGonegal, C. A., “A comparative performance study of several pitch detection algorithms,” 1976. • Cheveigne, A. de, and Kawahara, H., “YIN, a fundamental frequency estimator for speech and music,” 2002. • Ba, H., Yang, N., Demirkol, I., and Heinzelman, W., “BaNa: A hybrid approach for noise resilient pitch detection,” 2012.

Thanks! Q & A

Guest Lecture for ECE492 Computer Audition Single Pitch Detection

Guest Lecture for ECE492 Computer Audition Single Pitch Detection

Presentation Transcript

AUDITION

Audition

Audition

Audition

Guest Lecture for Ontological Engineering

Audition

Welcome to Computer Audition

GUEST LECTURE Chief guest Mr.Guhan Jayagopal

Audition

Audition

GUEST LECTURE Chief guest Mr . Anand Purushothaman

Single Spin Detection

Guest Lecture Tonight

Single Pass Anomaly Detection

Audition

Single spin detection

Guest Lecture: Computer-Assisted Language Learning

Pixel Sensors for Single Photon Detection

AUDITION

Audition

CS103 Guest Lecture