210 likes | 371 Views
Timbre Similarity Work by Aucouturier & Pachet. Rebecca Fiebrink MUMT 611 3 March 2005. Presentation Overview. Pachet & Aucouturier; why timbre similarity? Basic approach to quantifying timbre and timbre similarity “Finding songs that sound the same,” 2002 The CUIDADO project
E N D
Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005
Presentation Overview • Pachet & Aucouturier; why timbre similarity? • Basic approach to quantifying timbre and timbre similarity • “Finding songs that sound the same,” 2002 • The CUIDADO project • P & A’s work in context • Practical and theoretical improvements, 2004 • Remaining problems and future work
Who are they? • Sony Computer Science Library (CSL), Paris • François Pachet: Music access and interaction, “interestingness” • Jean-Julien Aucouturier: PhD student • A host of papers on music browsing, genre, metadata, segmentation, …
Why timbre similarity? Electronic Music Distribution (EMD) systems: • Move from mass-market to individualized distribution • Collaborative filtering isn’t sufficient • High-level, perceptually relevant descriptors play complementary / competing role; allow for more interesting music browsing • Makes more sense than “melodic similarity” • Tied to genre, but not too tightly
How to quantify timbre? • High-level descriptor for an entire song or piece • Mel Frequency Cepstral Coefficients (MFCCs) are building blocks • Related to spectral envelope • First few coefficients account for timbre envelope; later ones describe pitch • Derive a compact representation of a piece’s MFCC “space” and a way to compare representations for two pieces
A & P’s implementation (2002) • Find first 8 MFCCs every 50 ms • Model song as mixture of 3 Gaussian densities over all possible MFCCs of length 8 (GMM = “Gaussian mixture model”) • Calculate “distance” between GMMs by sampling • Sample from one GMM, compute likelihood of the samples given the other GMM • Force symmetry and normalize • Use 1000 samples • Store GMM information for each song and calculate similarity matrix
Results of 2002 version • Same artist • Harpsichord pieces: Bach - Wohltemperierte Clavier Fuga II in C minor and Bach – Wohltemperierte Clavier - Praeludium IV in C sharp minor • Trip Hop: Portishead - Mysterons (live) and Portishead - Sour Times • Different artists, same genre • Harpsichord pieces: Bach - Das Wohltemperierte Clavier - Praeludium IV in C sharp minor BWV849 and Couperin – Gavotte • "Woman Rock Singer": Leah Andreone - It's OK and Meredith Brooks – Bitch • “Interesting” results • “Classical” and “Pop": Beethoven - Romanze fur Violine und Orchester Nr. 2 F-dur op.50 and Beatles - Eleanor Rigby • "Trip Hop" and "Celtic Folk ": Portishead - Mysterons and Alan Stivell - Arvor You. (same kind of harpy theremin-like ambiance)
Evaluating results • No ground truth exists • Similarity is subjective • People don’t hear timbre alone • Survey of 10 people: Is A more like B or C? • Algorithm matches people 80% of time • One view: Divergence from expectation makes it useful
Generating “aha!” • Produce interesting matches: when genre and timbre are not correlated • Allow user control over size of “Aha!” exploration
Using the measure: CUIDADO • Content-based Unified Interfaces and Descriptors for Audio and Music Databases available Online • 2001-2003 European research project • “aims at developing a new chain of applications through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard” • design of appropriate description structures • development of extractors for deriving high-level information from audio signals • design and implementation of two applications: the Sound Palette and the Music Browser (From the CUIDADO website)
CUIDADO Music Browser • Client/server architecture for music browsing • Target audience: casual music lover • 17,075 popular music titles with metadata Picture from “The CUIDADO project”
Music Browser Query Panel Picture from “Popular music access”
Using Timbre in the Music Browser • Nearest-neighbor search • “Find me something that sounds like this song” • Allow user control over size of exploration: “Aha slider Same artist … Same genre … “interesting” • Playlist generation • Example: 1- Timbre continuity throughout the sequence 2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop • 3- Genre Distribution: the titles of the same genre should be as separated as possible
Sample playlist • Arlo Guthrie – City Of New Orleans (Folk/Rock) • Belle & Sebastien – The boy done wrong again (Rock/Alternative) • Ben Harper – Pleasure & Pain (Pop/Blues) • Joni Mitchell – Borderline (Folk/Pop) • Badly Drawn Boy – Camping Next to Water (Rock/Alternative) • Rolling Stones – You Can’t always get what you want (Pop/Blues) • Nick Drake - One of these things first (Folk/Pop) • Radiohead - Motion Picture Soundtrack (Rock/Brit) • The Beatles - Mother Nature's Son (Pop/Brit) • Tracy Chapman - Talkin' about a Revolution (Rock/Folk)
Work in Context • Several other researchers also use MFCCs with reasonable results: Baumann 2003, Berenzweig et al. 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2001, … • Pampalk, Dixon, and Widmer 2003 • P & A’s work is relatively accurate • Implementation is relatively slow • Incorporating use of 1st MFCC integrates average dynamic level into results • Hard to compare one group’s work with another’s • Hard to propose future research directions beyond parameter tweaking
Practical & Theoretical Improvements, 2004 • A & P conducted extensive tests varying algorithms and parameters of 2002 system • Can optimal parameter settings be found? • What is the limit on improvement? • Evaluate in the context of CUIDADO Music Browser
Optimal parameter values • Signal sample rate: higher is better • Distance sample rate (used to compare GMMs): higher is better, but little improvement over 1000 • Sampling can perform as well as Earth Mover’s distance (EMD) • The number of MFCCs and the number of components in the GMM jointly affect the outcome: • 50 components and 20 MFCCs is optimal • # components can be reduced without hurting performance much • 30 ms is optimum window size • Adhering to above guidelines leads to absolute improvement of 16% to precision • Precision is underestimated: considers same-genre only
Alternative algorithms • Several speech-processing algorithms were tried • Mixed results • No drastic improvements: 2% additional precision at most • HMM instead of GMM offers no improvement
Conclusions of 2004 study • “Ceiling” of 65% precision (conservative estimate) • False positives remain a problem • Jimi Hendrix != Joni Mitchell • Due to “hubs” in nearest-neighbor space • Problems are inherent in approach itself?
Proposals for future work • Address perception of timbre • Some frames are more important than others • Some timbres more salient than others • People assess similarity by choosing “This sounds like X” or “This doesn’t sound like X”
Conclusions • High-level, perceptually based similarity has a place in electronic music distribution • Current systems for timbre similarity have some use • There is still room for new, innovative, and cross-disciplinary work