Progressive Filtering and Its Application for Query-by-Singing/Humming

Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

Recent Publications • Journals • Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, 2008. • J.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. • Conferences • Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008. • Chao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008. • Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.

Outline • Problem definition of QBSH • Methods for QBSH • Progressive Filtering • Conclusions

Introduction to QBSH • QBSH: Query by Singing/Humming • Input: Singing or humming from microphone • Output: A ranking list retrieved from the song database • Overview • First paper: Around1994 • Extensive studies since 2001 • State of the art: QBSH tasks at ISMIR/MIREX

Challenges in QBSH Systems • Reliable pitch tracking for acoustic input • Input from mobile devices • Input at noisy karaoke box • Song database preparation • Audio music vs. MIDIs • Efficient/effective retrieval • Karaoke machine: ~10,000 songs • Internet music search engine: ~500,000,000 songs

Goal and Approach • Goal: To retrieve songs effectively within a given response time, say 5 seconds or so • Our strategy • Multi-stage progressive filtering • Data-driven design methodology based on DP

Approaches to QBSH • Pitch Tracking • Methods for QBSH

A Quick Demo of QBSH • Demo page of MIR lab: • http://mirlab.org/mir_main/demo.htm • Demo of QBSH • http://mirlab.org/Demo/MusicSearch/index.htm

Progressive Filtering • Multi-stage representation • Each stage is a method for QBSH … … stage 1 stage 2 stage i si: survival rate for stage i di: delay for stage i ni-1: no. of input songs to stage i

Stage Characteristics for Effectiveness • RS curve for stage i: recog. rate = ri(s) Recog. rates (%) Recog. rate (100, 100) 100 Survival rate More effective method Less effective method 65 Random guess Top-10% recog. rate is 65% Survival rates s (%) (0, 0) 10 100

Stage Characteristics for Efficiency • TS curve for stage i: average time = ti(s) Time Averagetime (ms) Survival rate Less efficient method 5 When s=10%, the average one-to-one comparison time is 5ms More efficient method (100, 0) Survival rates (%) (0, 0) 10 100

Formulation as an Optim. Problem • Max: subject to the constraints n (= n0): Size of the song database Tmax : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.

DP-based Approach • The orig. optim. task can be cast into DP: • Optimum-value function Ri(s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. • Recurrent formula for Ri(s, t) can be derived based on changing the survival rate of stage i, as follows.

Recurrent formula for Ri(s, t) di: delay of stage i … … stage 1 stage i-1 stage i

DP-based Approach • Boundary conditions for Ri(s, t) : • Optim. recog. rate: We can then back track to find the optimum s1, s2, …, sm.

Five Stages for Our Study • We chose 5 stages for DP-based design method: • Range comparison • Modified edit distance • LS • DTW with down-sampled inputs • DTW

Corpora • QBSH corpus • 2797 8-second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects • 500 for design set, the others for test • Song database • 13320 songs • Comparison mode • Anchored beginning

RS curves

TS Curves

Optimum RR wrt Response Time

Survival Rates wrt Response Time

Conclusions & Future Work • Conclusions • Advantages: • A scalable meta-method • Feasible for optimizing QBSH systems • Applicable (?) to other multimedia retrieval systems • Disadvantages • Derivation of RS and TS curves is time-consuming • Future work • More effective/efficient method for each stage

Progressive Filtering and Its Application for Query-by-Singing/Humming

Progressive Filtering and Its Application for Query-by-Singing/Humming

Presentation Transcript

Packet Filtering

Singing With Grace In Your Hearts Unto God

Progressive Computation of The Min-Dist Optimal-Location Query

Query Processing

Introduction to Oracle Application Express

I Hear America Singing By Walt Whitman Rush block, Carlin Mische, Kenny Anderson

Collaborative Filtering

Beespace Component: Filtering and Normalization for Biology Literature

“When Did We Stop Singing?”

Progressive Movement

Music

Multiple Intelligences

Query Processing

6 SOLO SINGING, CHOIR SINGING, HARMONIC SINGING, POP SINGING

Internet Filtering

Coherent envelope detection for modulation filtering of speech

Lesson Plan By: Joshua Berman

Filtering Multiple-Record Web Documents Based on Application Ontologies

Query Optimization

HWP2 – Application level query routing

RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval

Why You Suck At Singing