150 likes | 337 Views
Retrieval Methods for QBSH (Query By Singing/Humming). J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University. Retrieval Methods for QBSH. Goal Find the most similar melody in the database Challenges
E N D
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University
Retrieval Methods for QBSH • Goal • Find the most similar melody in the database • Challenges • Robust pitch tracking for various acoustic inputs • Input from mobile devices • Input at a noisy karaoke box • Comparison methods should be able to deal with… • Key variations in users’ input (for instance, due to gender difference) • Tempo variations in users’ input • Reasonable response time, e.g., 5 seconds
Evaluation of QBSH Methods • Two categories for evaluating QBSH methods • Efficiency: How fast is the system? • Can it deal with a music database of size 100K? • Effectiveness: How accurate is the system? • Top-10 recognition rates for n queries: • (1+0+0+1+1…)/n • Top-10 mean reciprocal rank for n queries: • (1/3+1/inf+1/4+1/2+1/5…)/n • True positive and true negative to deal with out-of-vocabulary (OOV) problem
Types of QBSH Approaches • Categories of approaches to QBSH • Histogram/statistics-based • Note vs. note • Edit distance • Frame vs. note • HMM • Frame vs. frame • Linear scaling, DTW, recursive alignment
Linear Scaling (LS) • Concept • Scale the query linearly to match the candidates • Assumption • Uniform tempo variation • Rest handling • Cut leading and trailing zeros (silence) • All the other zeros (rests) are replaced with the previous non-zero pitch
Linear Scaling • Scale the query pitch linearly to match the candidates Target pitch in database Compressed by 0.5 Compressed by 0.75 Original pitch Original input pitch Best match Stretched by 1.25 Stretched by 1.5
Strength and Weakness of LS • Strength • One-shot for dealing with key transposition • Efficient and effective • Indexing methods available • Weakness • Cannot deal with non-uniform tempo variations • Typical mapping path
Shorten or Lengthen a Pitch Vector • Given a pitch vector x of length m, how to shorten or lengthen it to length n? • x2=interp1(1:m, x, linspace(1, m, n)); • Examples • m=7, n=13 • m=7, n=9
Distance Function for LS • Commonly used distance function for LS • Normalized Lp-norm • Characteristics • Usually p=1 or 2 for LS • Normalization to get rid of length variations
Key Transposition in LS • How to find the best transposed query that has the smallest distance from the database items: • Best transposition • In practice… Query Database item Transposed query
Example of Linear Scaling via L1 Norm • linScaling01.m
Linear Scaling via L1 and L2 Norm • linScaling02.m
DTW (Dynamic Time Warping) • About DTW • DTW introduction • DTW for QBSH • #1 method for task 2 in QBSH/MIREX 2006
RA (Recursive Alignment) • Characteristics • Combine characteristics of LS & DTW • #1 method for task 1 in QBSH/MIREX 2006 • A typical mapping path
Modified Edit Distance • Note segmentation • Modified edit distance