580 likes | 796 Views
The Problem with Music: Modeling Distance Distributions of Large Music Collections. Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH. a.k.a. The Problem with Multimedia: Music Music Videos Videos Images. Scalable Similarity. 8M tracks in commercial collection
E N D
The Problem with Music:Modeling Distance Distributions of Large Music Collections Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH Comp. Sci. Colloquium
a.k.a.The Problem with Multimedia:MusicMusic VideosVideosImages
Scalable Similarity • 8M tracks in commercial collection • 6B Images on WWW • Require scalable nearest-neighbor methods • Increase scale, decrease search complexity
Example: Remixing / Sampling in Yahoo! Music • Original Track • Remix 1 • Remix 2 • Remix 3
Specificity • Partial document (sub-track) retrieval • Alternate versions: remix, cover, live, album • Task is mid-high specificity
Audio Shingles • Shingles provide contextual information about features • Originally used for Internet search engines: • Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: • “Syntactic Clustering of the Web”.Computer Networks 29(8-13): 1157-1166 (1997) • Related to N-grams, overlapping sequences of features • Applied to audio domain by Casey and Slaney : • Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc. • IEEE Int. Conf. onAcoustics, Speech and Signal Processing, 2006. ICASSP 2006 , concatenate l frames of m dimensional features A shingle is defined as:
Audio Shingle Similarity For shingles with M dimensions (M=l.m); m=12, 20; l=30,40 , a query shingle drawn from a query track {Q} , database of audio tracks indexed by (n) , a database shingle from track n Shingles are normalized to unit vectors, therefore:
Whole-track similarity • Often want to know which tracks are similar • Similarity depends on specificity of task • Distortion / filtering / re-encoding (high) • Remix with new audio material (mid) • Cover song: same song, different artist (mid)
Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks:
Whole-track resemblance:radius-bounded search Compute the number of shingle collisions between two tracks: • Requires a threshold for considering shingles to be related • Need a way to estimate relatedness (threshold) for data set
SCALE • Mazurkas: 10,000 tracks 10-100ms features • 3s clips (30 – 300 frames per vector) • 12d – 20d features (360 – 600d vectors) • Yahoo! Music • 6M tracks • 1000 vectors per track • (6M x 1k)^2 search for near neighbours
Approximate near neighbors • In many applications we need only near neghbors • We can exploit this by allowing a degree of approximation in retrieval
Curse of dimensionality d=4 d=8 d=1024 Pr(dist) dist.
Hashing • Types of hashes • String : put Bash vs Bush in different bins • Locality sensitive : close matches in same bin • High-dimensional and probabilistic • Nearest Neighbor implementations • Pair-wise distance computation • 1,000,000,000,000 comparisons in 2M song database • Hash bucket collisions • 1,000,000,000 hash projections
Exact matching via hashing • Audio fingerprinting • Shazzam, etc. • Make the feature robust • Use exact matching on integer hash • Find a sequence of hashes to identify specific recording or image • Drawback: only exact matches possible
Locality-Sensitive Hashing (Indyk-Motwani’98) • Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have: • Pr[h(p)=h(q)] is “high” if p is “close” to q • Pr[h(p)=h(q)] is “low” if p is”far” from q
Random Projections • Random projections estimate distance • Multiple projections improve estimate
h’s are locality-sensitive • Pr[h(p)=h(q)]=(1-D(p,q)/d)k • We can vary the probability by changing k Pr k=1 Pr k=2 distance distance
Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
Radius-bounded retrieval performance: cover song (opus task) • Performance depends critically on xthresh, the collision threshold • Want to estimate xthresh automatically from unlabelled data
Order Statistics • Minimum-value distribution is analytic • Estimate the distribution parameters • Substitute into minimum value distribution • Define a threshold in terms of FP rate • This gives an estimate of xthresh
Estimating xthresh from unlabelled data • Use theoretical statistics • Null Hypothesis: • H0: shingles are drawn from unrelated tracks • Assume elements i.i.d., normally distributed • M dimensional shingles, d effective degrees of freedom: • Squared distance distribution for H0
ML for background distribution • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality
Background distribution parameters • Likelihood for N data points (distances squared) • d = effective degrees of freedom • M = shingle dimensionality
Estimate of xthresh , false positive rate
Unlabelled data experiment • Unlabelled data set • Known to contain: • cover songs (same work, different performer) • Near duplicate recordings (misattribution, encoding) • Estimate background distance distribution • Estimate minimum value distribution • Set xthresh so FP rate is <= 1% • Whole-track retrieval based on shingle collisions
Misattributions • Joyce Hatto: 100% of known misattributions in first rank • Sergie Fiorentino • Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Click here for further details.