440 likes | 546 Views
Query by Pitch. Jin Yi and Russell Brennan. Introduction. Input: Sing a snippet of a song Output: Name of the song, artist, genre etc. Marketable: Integrate with online music shops Useful: Provides a quick, easy solution for determining song information . Methodology. Vocal delivery
E N D
Query by Pitch Jin Yi and Russell Brennan
Introduction • Input: Sing a snippet of a song • Output: Name of the song, artist, genre etc. • Marketable: Integrate with online music shops • Useful: Provides a quick, easy solution for determining song information
Methodology • Vocal delivery • Subject to sing into microphone • Filtering • Filter noise via ~100 – 800Hz bandpass filter • Pitch Detection • Calculate difference function to determine fundamental frequency • Segmentation • Determine discrete pitches throughout signal
Methodology (continued) • Indexing/Database Building • Calculate ratios of pitches and pitch durations to previous pitches and durations • Create database of known song ratios for comparison • Comparison • Compute second difference function, sliding vocal ratios across database ratio windows • Result • Song with lowest difference
Bandpass Filter • Needed for filtering out noise • Butterworth filter doesn’t have ripple in the passband, unlike the Chebyshev filter
Bandpass Filter (as originally intended, 4th order bandpass filter)
First Order Bandpass Filter + First Order Lowpass Filter The output signal is too low since voltage is consumed in the resistor First order low pass filter First order bandpass filter
Inverted Amplifier Added op-amp Gain = -r2/r1
Final Circuit For Bandpass Filter • Bandpass filter cuts off the low frequency but has a long transition band for the high cutoff. We added two more low pass filters. Microphone Inverted op-amp Inverted op-amp Low pass filter Bandpass filter Low pass filter dsp
Pitch Detection • Vocal delivery creates a periodic signal in short-time… • It should have a high correlation with itself, when shifted one period
Detect the Period • A difference function squared: • dt(tau) = sum(j=1 to W) (sj – sj-tau)2 • Can detect the offset, tau = period • The period will be at a minimum of this difference function.
Segmentation • Exponentially Weighted Moving Average (EWMA) • EWMA is often used in statistical process control to detect shifts in the mean of a process • The pitch from dsp should be smoothed to detect changes in pitch better • EWMA weights current and past values to create a current estimate of a signal average • A(i)=r * signal(i) + (1-r) * A(i-1)
Segmentation (continued) • Use EWMA thusly: • EWMA “smoothes” the signal greatly • Detects shift in pitch by detecting a trend line • A trend of 4 in a row increasing or decreasing indicates a shift in mean • Can we trust the EWMA? • Each trend line becomes a mark
Segmentation (conclusion) • By default, a mark is placed at the first and last samples of the pitch signal • Calculate means of pitch signal within eachmark section, i.e. 1-25:26-39:39-50 • If means are reasonably close, consider them one (this happens often) • Ratios of mean(i-1) / mean(i) are used for comparison
Block Diagram for Calculating the Ratio Index Mark The Pitch EWMA Calculate Ratio Combine Close Pitch Calculate Pitch
Example of calculating the ratio • Marks: 1 1 36 111 168 • Pitches : 214.4 161.4 240.0 • Ratios: 161/214 , 240/161 = .737, 1.52
Algorithm for Finding the Right Song • R1 = Ratio of The Database • R2 = Ratio of The Current Input • Difference = (R1 – R2) ^2
d 91 (2-8)^2 (3-9)^2 (1-0)^2 (4-3)^2 (5-4)^2 (6-2)^2
91 135 (3-8)^2 (1-9)^2 (4-0)^2 (5-3)^2 (6-4)^2 (7-2)^2
91 135 (1-8)^2 195 (4-9)^2 (5-0)^2 (6-3)^2 (7-4)^2 (8-2)^2
91 135 (4-8)^2 195 149 (5-9)^2 (6-0)^2 (7-3)^2 (8-4)^2 (9-2)^2
91 135 195 (5-8)^2 149 121 (6-9)^2 (7-0)^2 (8-3)^2 (9-4)^2 (0-2)^2
91 135 195 (6-8)^2 149 121 (7-9)^2 125 (8-0)^2 (9-3)^2 (0-4)^2 (3-2)^2
91 135 195 (7-8)^2 149 121 (8-9)^2 125 97 (9-0)^2 (0-3)^2 (3-4)^2 (4-2)^2
91 135 (8-8)^2 195 149 (9-9)^2 121 125 (0-0)^2 97 (3-3)^2 0 (4-4)^2 (2-2)^2
91 135 (9-8)^2 195 149 (0-9)^2 121 125 (0-3)^2 97 (4-3)^2 0 (2-4)^2 97 (1-2)^2
91 135 195 (2-8)^2 149 121 (3-9)^2 125 97 (4-3)^2 0 (4-3)^2 97 (5-4)^2 91 (6-2)^2
91 135 195 (3-8)^2 149 121 (4-9)^2 125 (2-3)^2 97 0 (1-3)^2 97 (4-4)^2 91 64 (5-2)^2
Returns the minimum 91 135 195 149 121 125 Comparison Result for this song is 0 97 0 97 91 64
In case of missing the pitch • This doesn’t work since one missing pitch will cause two incorrect ratios 91 pitches ratio Correct pitch 4 2 5 6 7 8 3 4/2 2/5 5/6 6/7 7/8 8/3 = 2 0.4 0.83 0.86 .88 2.67 missing One pitch 4 2 5 6 8 3 4/2 2/5 5/6 6/8 8/3 = 2 0.4 0.83 .75 2.67 Messing up this pitch and there is one pitch missing
Build • Band-Pass Filter • Capacitors, inductors and op-amps, as well as resistors • Pitch Detection • TI 54x DSP dev board • Code Composer Studio version 1.2 • Serial Transmission of pitch indexes • Start/Stop signal capabilities
Build (continued) • Pitch index reception/post-processing • Programmed as a standalone application in C++ • Ability to change song database on-the-fly
Testing • Band-Pass Filter • Input a sinusoid and observed the result in the oscilloscope. • Measured the voltage at nodes to debug. • Output had a -0.6v Offset. Getting rid of this offset is not necessary since we are detecting only periodicity.
Testing • Pitch Detector • Most testing done in Matlab environment • Sinusoid, swept sinusoid, noisy sinusoid, harmonic stack with noise, vocal singing • From DSP, serial output and memory dumps • Stepwise expectation verification
Testing Serial Port • Tera Term was used to test output of DSP. • Assembly serial port output function did not work for some reason. We had to use C functions written for ECE 420 • ASCII code was interpreted and found to correspond to correct pitches. Sending characters to the DSP was tested using a DSP on/off technique.
Debugging the Software • In the Unix programming environment, most people use ‘printf’ to debug • In visual C++(api), printf cannot be used, so we debugged using popup windows • To view intermediate values or any results, we converted floating point numbers or integers to strings for use with popups.
Testing Segmentation • Segmentation was first tested in Matlab to facilitate quick changes • Test clips of pepole singing short tunes were used • Parameters such as average weight decay and trend length were adjusted • Finally, the algorithm was integrated into our main executable
Discussion (Successes/Failures) • Vocal extraction (failure) • Missing pitches • At least 5-6 pitches are needed • The program could match some songs almost 90 percent of the time
Recommendations • Missing pitches ( curve-fitting ) • Duration of pitches • “De-esser” • Harmonic search (vocal extraction) • Double pitch output
References • A. D. Cheveigne and H. Kawahara. Yin: A fundamental frequency estimator for speech and music. Journal of theAcoustical Society of America, 111(4), 2002. • Mark Hasegawa-Johnson. Audio Engineering Lecture Notes for ECE 403. January 20, 2005. • Robert Morrison, Jason Laska, Douglas Jones. Digital Signal Processing Laboratory. Feb. 21, 2005. http://cnx.rice.edu/content/col10236/latest/ • Alex Spektor, Personal Communication, Summer 2005