300 likes | 382 Views
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing. Michael A. Casey Digital Musics Dartmouth College, Hanover, NH. Scalable Similarity. 8M tracks in commercial collection PByte of multimedia data Require passage-level retrieval (~ 2 bars)
E N D
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing Michael A. Casey Digital Musics Dartmouth College, Hanover, NH ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals
Scalable Similarity • 8M tracks in commercial collection • PByte of multimedia data • Require passage-level retrieval (~ 2 bars) • Require scalable nearest-neighbor methods
Specificity • Partial track retrieval • Alternate versions: remix, cover, live, album • Task is mid-high specificity
Example: remixing • Original Track • Remix 1 • Remix 2 • Remix 3
Audio Shingles • Shingles provide contextual information about features • Originally used for Internet search engines: • Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: • “Syntactic Clustering of the Web”.Computer Networks 29(8-13): 1157-1166 (1997) • Related to N-grams, overlapping sequences of features • Applied to audio domain by Casey and Slaney : • Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc. • IEEE Int. Conf. onAcoustics, Speech and Signal Processing, 2006. ICASSP 2006 , concatenate l frames of m dimensional features A shingle is defined as:
Audio Shingle Similarity For shingles with M dimensions (M=l.m); m=12, 20; l=30,40 , a query shingle drawn from a query track {Q} , database of audio tracks indexed by (n) , a database shingle from track n Shingles are normalized to unit vectors, therefore:
AudioDB: Shingle Nearest Neighbor Search • Open source: google: “audioDB” • Management of tracks, sequences, salience • Automatic indexing parameters • OMRAS2, Yahoo!, AWAL, CHARM, more… • Web-services interface (SOAP / JSON) • Implementation of LSH for large N ~ 1B • 1-10 ms whole-track retrieval from 1B vectors
Whole-track similarity • Often want to know which tracks are similar • Similarity depends on specificity of task • Distortion / filtering / re-encoding (high) • Remix with new audio material (mid) • Cover song: same song, different artist (mid)
Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks:
Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks: • Requires a threshold for considering shingles to be related • Need a way to estimate relatedness (threshold) for data set
Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
Radius-bounded retrieval performance: cover song (opus task) • Performance depends critically on xthresh, the collision threshold • Want to estimate xthresh automatically from unlabelled data
Order Statistics • Minimum-value distribution is analytic • Estimate the distribution parameters • Substitute into minimum value distribution • Define a threshold in terms of FP rate • This gives an estimate of xthresh
Estimating xthresh from unlabelled data • Use theoretical statistics • Null Hypothesis: • H0: shingles are drawn from unrelated tracks • Assume elements i.i.d., normally distributed • M dimensional shingles, d effective degrees of freedom: • Squared distance distribution for H0
ML for background distribution • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality
Background distribution parameters • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality
Estimate of xthresh , false positive rate
Unlabelled data experiment • Unlabelled data set • Known to contain: • cover songs (same work, different performer) • Near duplicate recordings (misattribution, encoding) • Estimate background distance distribution • Estimate minimum value distribution • Set xthresh so FP rate is <= 1% • Whole-track retrieval based on shingle collisions
Scaling • Locality sensitive hashing • Trade-off approximate NN for time complexity • 3 to 4 orders of magnitude speed-up • No noticeable degradation in performance • For optimal radius threshold
Current deployment • Large commercial collections • AWAL ~ 100,000 tracks • Yahoo! 2M+ tracks, related song classifier • AudioDB: open-source, international consortium of developers • Google: “audioDB”
Conclusions • Radius-bounded retrieval model for tracks • Shingles preserve temporal information, high d • Implements mid-to-high specificity search • Optimal radius threshold from order statistics • null hypothesis: shingles are drawn from unrelated tracks • LSH requires radius bound, automatic estimate • Scales to 1B shingles+ using LSH
Thanks • Malcolm Slaney, Yahoo! Research Inc. • Christophe Rhodes, Goldsmiths, U. of London • Michela Magas, Goldsmiths, U. of London • Funding: EPSRC: EP/E02274X/1