190 likes | 347 Views
Query by Tapping 敲擊選歌. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://mirlab.org/jang. Query by Tapping. Goal: Music search based on uses’ tapping (at notes’ onsets) over the microphone/keyboard Characteristics
E N D
Query by Tapping敲擊選歌 J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://mirlab.org/jang
Query by Tapping • Goal: • Music search based on uses’ tapping (at notes’ onsets) over the microphone/keyboard • Characteristics • Only note duration is used for comparison, note pitch is discarded. • A hard task for human to recognize (which is different from query by singing/humming) • Try this…
Query by Tapping • Goal: • Music search based on uses’ tapping (at notes’ onsets) over the microphone/keyboard • Characteristics • Only note duration is used for comparison, note pitch is discarded. • A hard task for human to recognize (which is different from query by singing/humming) • Try this…
Query by Tapping • Challenges: • Users is unlikely to use the same tempo as the intended song • Users tend to lose notes instead of gaining ones • We have about 13,000 songs in the database • Major approach: • A distance measure based on dynamic programming
Feature Extraction via Microphone • Microphone input: • After frame blocking, energy computation, and thresholding:
Performance Evaluation of Onset Detection • simSequence.m precision=3/6=0.5 recall=3/5=0.6 f-measure=2pr/(p+r)=0.5455
Similarity Comparison with Songs in Database • A fast method based on IOI ratios • Compute the IOI ratios for both query and db IOI vectors • Compute the Euclidean distance these two ratio vectors
Music Note Alignment t: test (input) IOI vector r: reference IOI vector Alignment by DP Normalization r(3) t(3) r(2) t(2) r(1) t(1) t r t r t r
Normalization • Normalization to have (Multiplication of 1000 to guarantee high resolution in fixed-point computation.)
Dynamic-programming-based Distance j t: test IOI vector of length m r: reference IOI vector of length n Recurrent relation: r(j-1) r(j-2) r(2) r(1) t(2) t(1) t(i-2) t(i-1) i
Experimental Environment • 269 test wave files of tapping clips • 9 contributors (7 males, 2 females) • Wave length: 15 seconds • Wave format: PCM, 11025Hz, 8bits, Mono • Start position: Beginning of a song • Environment • Pentium III 800, 256MB RAM • Database • 11,744 MIDI files
Average response time: 3.42 seconds (29.98 notes) Recognition rates: Top-1 (top 0.0085%): 15% Top-10 (top 0.085%): 51% Top-100 (top 0.85%): 80% Test Results Using Clips of 15 Seconds
Error Analysis • Errors analysis of low-ranked clips • Some users cannot tap consistently through 15 seconds • Feature extraction is not robust enough to handle noisy input. • Some MIDI files are not faithful rendition of the original tunes. • Users cannot keep up with short consecutive notes.
Top-100 and 1000 curves level off after 10 seconds. Top-100 curve does not go up monotonically. Recog. Rates w.r.t. Tapping Duration Top-1000 Top-100 Top-10
Demo • No. of MIDI files: 12982
All I have to do is dream You are my sunshine Beautiful Sunday Do Re Mi Feelings A time for us Love is blue Let it be me My way Love story More than I can say Only you Rain and tears Rhythm of the rain Rose Rose I love you The sound of silence Unchained melody We are the world Yesterday I just call to say I love you Close to you Mr. Lonely Ben Hey Jude Donna Donna Sealed with a kiss Partial List of Songs
Potential Applications • Interactive toys • Beat-tracking training and games • Song retrieval in noisy karaoke bars
Conclusions • Our MIR system is the first one with query-by-tapping capability. • Rhythm-based search can be used in conjunction with pitch-contour-based search to achieve a better recognition rate.
Future Work • Search scope expansion • How to retrieve MP3 or CD music directly? • Scale-up by hierarchical filtering method • How to deal with database with 100,000 songs? • What if the user tap from anywhere in the middle of a song?