580 likes | 1.02k Views
Pitch Tracking ( 音高追蹤 ). Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang. Pitch ( 音高 ). Definition of pitch Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform
E N D
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang
Pitch (音高) • Definition of pitch • Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform • Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) • Characteristics of pitch • Noise and unvoiced sounds do not have pitch.
Pitch Tracking (音高追蹤) • Pitch tracking (PT): The process of computing the pitch vector of a give audio segment (對整段音訊求取音高) • Sample applications • Query by singing/humming (哼唱選歌) • Tone recognition for Mandarin (華語的音調辨識) • Intonation scoring for English (英語的音調評分) • Prosody analysis for speech synthesis (語音合成中的韻律分析) • Pitch scaling and duration modification (音高調節與長度改變)
Typical Steps for Pitch Tracking • Pre-processing • Filtering • Excitation extraction • Main processing • Frame blocking • periodicity detection function computation • Pitch determination via max/min picking over the PDF • Post-processing • Unreliable pitch removal via volume/clarity thresholding • Pitch refinement via parabolic interpolation • Pitch smoothing via median filters, etc.
Frame Blocking Overlap Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Frame
Periodicity Detection Functions • Periodicity detection functions (PDF) are used to detect the period of a waveform • Two categories of PDF • Time domain (時域) • ACF (Autocorrelation function) • NSDF (Normalized squared difference function) • AMDF (Average magnitude difference function) • Frequency domain (頻域) • Harmonic product spectrum • Cepstrum
ACF: Auto-correlation Function 1 128 Original frame s(t): Shifted frame s(t-t): t=30 acf(30) = inner product of the overlap part Pitch period To play safe, the frame size needs to cover at least two fundamental periods!
ACF: Formula 1 • Assume a frame is represented by s(t), t=0~n-1 • ACF formula s(t) s(t): t s(t-t): t s(t-t)
ACF: Formula 2 • Assume a frame is represented by s(t), t=0~n-1 • ACF formula s(t) s(t): t s(t+t): t s(t+t)
Example of ACF • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Max of ACF occurs at index 131 • FF = 16000/131 = 123.077 Hz • frame2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.
Locating the Pitch Point • If the range of humans’ FF is [40, 1000], then we have the interval for locating the index for pitch point (PP): • frame2acfPitchPoint01.m Index 0 Index pp
Locating the Pitch Point (II) • What could go wrong? • Vitas • http://www.youtube.com/watch?v=YjO_VXHxsRw&hd=1 (local short clip) • Whistling • Low-pitch singing/humming requires a big frame size
Example of ACF Based PT • Specs • Sample rate = 11025 Hz • Frame size = 353 points = 32 ms • Overlap = 0 • Frame rate = 31.25 f/s • Playback • Original singing • Pitch by ACF • wave2pitchByAcf01.m
Example of ACF Based PT (II) • Specs • The previous script is converted into a function pitchTrackingSimple.m for easy access. • ptByAcf01.m
Demo of ACF-based PT • Real-time display of ACF for pitch tracking • goPtByAcf.mdl under SAP toolbox • Real-time pitch tracking for mic input • goPtByAcf2.mdl under SAP toolbox
ACF Variants to Avoid Tapering • Normalized version • frame2acf02.m • Half-frame shifting • frame2acf03.m
NSDF: ACF Variant with Normalize Range • NSDF: normalized squared difference function • Formula: • A variant of ACF within the range [-1 1], based on the inequality:
NSDF Example • frame2nsdf01.m Clarity: height of the pitch point
AMDF: Average Magnitude Difference Function 1 128 Original frame s(i): Shifted frame s(i-t): t=30 amdf(30) = sum of abs. difference of the overlap part Pitch period 30
Example of AMDF • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Pitch point occurs at index 131, which is harder to determine • frame2amdf01.m Index 0 Index 131
Example of AMDF to Pitch • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Pitch point occurs at index 131, which is determined correctly • FF = 16000/131 = 123.077 Hz • frame2amdf4pt01.m Index 0 Index 131
Example of AMDF Based PT • Specs • Sample rate = 11025 Hz • Frame size = 353 points = 32 ms • Overlap = 0 • Frame rate = 31.25 f/s • Playback • Original singing • Pitch by AMDF • ptByAmdf01.m
AMDF: Variations to Avoid Tapering • Normalized version • frame2amdf02.m • Half-frame shifting • frame2amdf03.m
Combining ACF and AMDF Frame ACF AMDF ACF/AMDF
Audio Features in Time Domain • Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP
Audio Features in Frequency Domain • Energy: Sum of power spectrum • Pitch: Distance between harmonics • Timber: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Energy
About DFT & FFT • Terminology • DFT: Discrete Fourier transform • FFT: Fast Fourier transform, which is an efficient method for computing DFT • More about DFT
Harmonic Product Spectrum (HPS) • Procedure • Compute the power spectrum of a frame • Eliminate its trend obtained from 20-order polynomial fitting Formants are removed • Apply exponential weighting to suppress high-frequency harmonics • Down sample and add to enhance the harmonics at the fundamental frequency • Find the max as the pitch point
Example of HPS • frame2hps01.m
Example of PT by HPS • ptByHps01.m
PT by Cepstrum • Formula for cepstrum • Procedure for PT by cepstrum • Compute the power spectrum of a frame. • Eliminate the trend of the power spectrum if necessary. • Take the inverse FFT on the (symmetric) power spectrum. (The result is real, why?) • Find position of the max to compute the pitch.
PT by Cepstrum: How It Works? Close to sinusoids! This should be a single pulse only!
Example of Cepstrum • frame2ceps01.m
Example of PT by Cepstrum • ptByCeps01.m
Preprocessing for Pitch Tracking • Some commonly used preprocessing for the audio signals before pitch tracking • Pre-filtering the signals • Clipping the signals • SIFT method for the signals
Preprocessing: Pre-filtering • Observation • Range of humans’ pitch: [40, 1000] • Idea • Low-pass the signals with a cutoff frequency between 800 and 1000 • Characteristics • The effect is yet to be verified
Preprocessing: Clipping • Observation • Small signals near zero is likely to cause pitch tracking error • Idea • Clip the signals • Characteristics • Save computation for embedded system • Overall effect is yet to be verified
Preprocessing: SIFT • Observation • Channel effect is likely to cause pitch tracking error • Idea of SIFT (simple inverse filter tracking) • Identify the excitation via LPC • Use the excitation for PDF • Characteristics • Overall effect is yet to be verified
Example of SIFT • siftAcf01.m
Example of PT based on SIFT & ACF • ptBySiftAcf01.m
Postprocessing for Pitch Tracking • Some commonly used postprocessing for pitch tracking • Smoothing to remove abrupt-changing pitch • Interpolation to increase pitch precision
Postprocessing: Smoothing • Smoothing by a median filter • ptWithMedianFilter01.m
Postprocessing: Interpolation • Idea • Using the pitch point and its neighbors to identify the max position • ptWithParabolicFit01.m
UPDUDP (1/4) • UPDUDP: Unbroken Pitch Determination Using DP • Goal: To take pitch smoothness into consideration • : a given path in the AMDF matrix • : Number of frames • : Transition penalty • : Exponent of the transition difference Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.
UPDUDP (2/4) • Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j) • Recurrent formula: • Initial conditions : • Optimum cost :
Example of UPDUDP • A typical example (via AMDF)
Robustness of UPDUDP • Insensitivity in
Another Example of UPDUDP • Example of MATLAB code using UPDUDP (via ACF) • Result waveFile='arina_short.wav'; wObj=waveFile2obj(waveFile); ptOpt=ptOptSet(wObj.fs, wObj.nbits, 1); pitch=pitchTracking(wObj, ptOpt, 1);
Frequency to Semitone Conversion • Semitone : A music scale based on A440 • Reasonable pitch range: • E2 - C6 • 82 Hz - 1047 Hz ( - )