1 / 21

Timbre Similarity Work by Aucouturier & Pachet

Timbre Similarity Work by Aucouturier & Pachet. Rebecca Fiebrink MUMT 611 3 March 2005. Presentation Overview. Pachet & Aucouturier; why timbre similarity? Basic approach to quantifying timbre and timbre similarity “Finding songs that sound the same,” 2002 The CUIDADO project

lynne
Download Presentation

Timbre Similarity Work by Aucouturier & Pachet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005

  2. Presentation Overview • Pachet & Aucouturier; why timbre similarity? • Basic approach to quantifying timbre and timbre similarity • “Finding songs that sound the same,” 2002 • The CUIDADO project • P & A’s work in context • Practical and theoretical improvements, 2004 • Remaining problems and future work

  3. Who are they? • Sony Computer Science Library (CSL), Paris • François Pachet: Music access and interaction, “interestingness” • Jean-Julien Aucouturier: PhD student • A host of papers on music browsing, genre, metadata, segmentation, …

  4. Why timbre similarity? Electronic Music Distribution (EMD) systems: • Move from mass-market to individualized distribution • Collaborative filtering isn’t sufficient • High-level, perceptually relevant descriptors play complementary / competing role; allow for more interesting music browsing • Makes more sense than “melodic similarity” • Tied to genre, but not too tightly

  5. How to quantify timbre? • High-level descriptor for an entire song or piece • Mel Frequency Cepstral Coefficients (MFCCs) are building blocks • Related to spectral envelope • First few coefficients account for timbre envelope; later ones describe pitch • Derive a compact representation of a piece’s MFCC “space” and a way to compare representations for two pieces

  6. A & P’s implementation (2002) • Find first 8 MFCCs every 50 ms • Model song as mixture of 3 Gaussian densities over all possible MFCCs of length 8 (GMM = “Gaussian mixture model”) • Calculate “distance” between GMMs by sampling • Sample from one GMM, compute likelihood of the samples given the other GMM • Force symmetry and normalize • Use 1000 samples • Store GMM information for each song and calculate similarity matrix

  7. Results of 2002 version • Same artist • Harpsichord pieces: Bach - Wohltemperierte Clavier Fuga II in C minor and Bach – Wohltemperierte Clavier - Praeludium IV in C sharp minor • Trip Hop: Portishead - Mysterons (live) and Portishead - Sour Times • Different artists, same genre • Harpsichord pieces: Bach - Das Wohltemperierte Clavier - Praeludium IV in C sharp minor BWV849 and Couperin – Gavotte • "Woman Rock Singer": Leah Andreone - It's OK and Meredith Brooks – Bitch • “Interesting” results • “Classical” and “Pop": Beethoven - Romanze fur Violine und Orchester Nr. 2 F-dur op.50 and Beatles - Eleanor Rigby • "Trip Hop" and "Celtic Folk ": Portishead - Mysterons and Alan Stivell - Arvor You. (same kind of harpy theremin-like ambiance)

  8. Evaluating results • No ground truth exists • Similarity is subjective • People don’t hear timbre alone • Survey of 10 people: Is A more like B or C? • Algorithm matches people 80% of time • One view: Divergence from expectation makes it useful

  9. Generating “aha!” • Produce interesting matches: when genre and timbre are not correlated • Allow user control over size of “Aha!” exploration

  10. Using the measure: CUIDADO • Content-based Unified Interfaces and Descriptors for Audio and Music Databases available Online • 2001-2003 European research project • “aims at developing a new chain of applications through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard” • design of appropriate description structures • development of extractors for deriving high-level information from audio signals • design and implementation of two applications: the Sound Palette and the Music Browser (From the CUIDADO website)

  11. CUIDADO Music Browser • Client/server architecture for music browsing • Target audience: casual music lover • 17,075 popular music titles with metadata Picture from “The CUIDADO project”

  12. Music Browser Query Panel Picture from “Popular music access”

  13. Using Timbre in the Music Browser • Nearest-neighbor search • “Find me something that sounds like this song” • Allow user control over size of exploration: “Aha slider  Same artist … Same genre … “interesting”  • Playlist generation • Example: 1- Timbre continuity throughout the sequence 2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop • 3- Genre Distribution: the titles of the same genre should be as separated as possible

  14. Sample playlist • Arlo Guthrie – City Of New Orleans (Folk/Rock) • Belle & Sebastien – The boy done wrong again (Rock/Alternative) • Ben Harper – Pleasure & Pain (Pop/Blues) • Joni Mitchell – Borderline (Folk/Pop) • Badly Drawn Boy – Camping Next to Water (Rock/Alternative) • Rolling Stones – You Can’t always get what you want (Pop/Blues) • Nick Drake - One of these things first (Folk/Pop) • Radiohead - Motion Picture Soundtrack (Rock/Brit) • The Beatles - Mother Nature's Son (Pop/Brit) • Tracy Chapman - Talkin' about a Revolution (Rock/Folk)

  15. Work in Context • Several other researchers also use MFCCs with reasonable results: Baumann 2003, Berenzweig et al. 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2001, … • Pampalk, Dixon, and Widmer 2003 • P & A’s work is relatively accurate • Implementation is relatively slow • Incorporating use of 1st MFCC integrates average dynamic level into results • Hard to compare one group’s work with another’s • Hard to propose future research directions beyond parameter tweaking

  16. Practical & Theoretical Improvements, 2004 • A & P conducted extensive tests varying algorithms and parameters of 2002 system • Can optimal parameter settings be found? • What is the limit on improvement? • Evaluate in the context of CUIDADO Music Browser

  17. Optimal parameter values • Signal sample rate: higher is better • Distance sample rate (used to compare GMMs): higher is better, but little improvement over 1000 • Sampling can perform as well as Earth Mover’s distance (EMD) • The number of MFCCs and the number of components in the GMM jointly affect the outcome: • 50 components and 20 MFCCs is optimal • # components can be reduced without hurting performance much • 30 ms is optimum window size • Adhering to above guidelines leads to absolute improvement of 16% to precision • Precision is underestimated: considers same-genre only

  18. Alternative algorithms • Several speech-processing algorithms were tried • Mixed results • No drastic improvements: 2% additional precision at most • HMM instead of GMM offers no improvement

  19. Conclusions of 2004 study • “Ceiling” of 65% precision (conservative estimate) • False positives remain a problem • Jimi Hendrix != Joni Mitchell • Due to “hubs” in nearest-neighbor space • Problems are inherent in approach itself?

  20. Proposals for future work • Address perception of timbre • Some frames are more important than others • Some timbres more salient than others • People assess similarity by choosing “This sounds like X” or “This doesn’t sound like X”

  21. Conclusions • High-level, perceptually based similarity has a place in electronic music distribution • Current systems for timbre similarity have some use • There is still room for new, innovative, and cross-disciplinary work

More Related