230 likes | 365 Views
Progressive Filtering and Its Application for Query-by-Singing/Humming. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang. Recent Publications. Journals
E N D
Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang
Recent Publications • Journals • Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, 2008. • J.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. • Conferences • Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008. • Chao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008. • Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.
Outline • Problem definition of QBSH • Methods for QBSH • Progressive Filtering • Conclusions
Introduction to QBSH • QBSH: Query by Singing/Humming • Input: Singing or humming from microphone • Output: A ranking list retrieved from the song database • Overview • First paper: Around1994 • Extensive studies since 2001 • State of the art: QBSH tasks at ISMIR/MIREX
Challenges in QBSH Systems • Reliable pitch tracking for acoustic input • Input from mobile devices • Input at noisy karaoke box • Song database preparation • Audio music vs. MIDIs • Efficient/effective retrieval • Karaoke machine: ~10,000 songs • Internet music search engine: ~500,000,000 songs
Goal and Approach • Goal: To retrieve songs effectively within a given response time, say 5 seconds or so • Our strategy • Multi-stage progressive filtering • Data-driven design methodology based on DP
Approaches to QBSH • Pitch Tracking • Methods for QBSH
A Quick Demo of QBSH • Demo page of MIR lab: • http://mirlab.org/mir_main/demo.htm • Demo of QBSH • http://mirlab.org/Demo/MusicSearch/index.htm
Progressive Filtering • Multi-stage representation • Each stage is a method for QBSH … … stage 1 stage 2 stage i si: survival rate for stage i di: delay for stage i ni-1: no. of input songs to stage i
Stage Characteristics for Effectiveness • RS curve for stage i: recog. rate = ri(s) Recog. rates (%) Recog. rate (100, 100) 100 Survival rate More effective method Less effective method 65 Random guess Top-10% recog. rate is 65% Survival rates s (%) (0, 0) 10 100
Stage Characteristics for Efficiency • TS curve for stage i: average time = ti(s) Time Averagetime (ms) Survival rate Less efficient method 5 When s=10%, the average one-to-one comparison time is 5ms More efficient method (100, 0) Survival rates (%) (0, 0) 10 100
Formulation as an Optim. Problem • Max: subject to the constraints n (= n0): Size of the song database Tmax : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.
DP-based Approach • The orig. optim. task can be cast into DP: • Optimum-value function Ri(s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. • Recurrent formula for Ri(s, t) can be derived based on changing the survival rate of stage i, as follows.
Recurrent formula for Ri(s, t) di: delay of stage i … … stage 1 stage i-1 stage i
DP-based Approach • Boundary conditions for Ri(s, t) : • Optim. recog. rate: We can then back track to find the optimum s1, s2, …, sm.
Five Stages for Our Study • We chose 5 stages for DP-based design method: • Range comparison • Modified edit distance • LS • DTW with down-sampled inputs • DTW
Corpora • QBSH corpus • 2797 8-second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects • 500 for design set, the others for test • Song database • 13320 songs • Comparison mode • Anchored beginning
Conclusions & Future Work • Conclusions • Advantages: • A scalable meta-method • Feasible for optimizing QBSH systems • Applicable (?) to other multimedia retrieval systems • Disadvantages • Derivation of RS and TS curves is time-consuming • Future work • More effective/efficient method for each stage