340 likes | 440 Views
convenient MIR systems. vision vs. reality check, research & e-commerce. Stephan Baumann. Agenda. Personal Profile Convenient Music Information Retrieval Multi-modal queries Identification by description Multi-facet music similarity Timbre Lyrics Cultural aspects
E N D
convenient MIR systems vision vs. reality check, research & e-commerce Stephan Baumann
Agenda • Personal Profile • Convenient Music Information Retrieval • Multi-modal queries • Identification by description • Multi-facet music similarity • Timbre • Lyrics • Cultural aspects • Project MPEER: P2P, semantic web and MIR
Research Diary (1991-2003) • 1991/92 optical music recognition • 1992/93 online handwriting recognition • 1993/94 optical music recognition • 1995/97 document analysis and understanding • 1996 first look on MultimediaIR (S.Pfeiffer) • 1998/99 spinoff activities with Insiders GmbH • 2000 freelancing/research for draft MIR system • 2001 co-founding Sonicson GmbH • 2001/03 subjective music similarity (Ph.D.. Sep03)
Desiderata MIR [Huron] • 1. Access to all of the world’s music • 2. Access via an indexing method • 3. Fair use (reimbursement to all contributors) • 4. Open system • 5. Self-correcting system • 6. Ensurement of privacy and cultural practices
Related Work • Audio: • [Blum, Wold], [Pfeiffer], [Foote], [Logan], ... • [Scheirer], [Tzanetakis], [Welsh], [Aucouturier], [Peeters], ... • Cultural: • [Whitman], [Pachet], [Ellis, Berenzweig] • Multi-modal MIR • [Bainbridge], ... • Recommendation • [Amazon, Moodlogic, MusicGenome, MuBu, MongoMusic], ... • [Uitdenbogerd] • User Models • [Chai, Vercoe], [Rolland] • Music Psychology • [Bruhn, Rösing], [Gabriellson, Västfjäll], ... • Usability, Convenience • [Shneiderman], [Nielson], ...
Convenience • Using natural language as input for queries of non-musicians • Accessing meta data, symbolic and audio layers in one interface • Evaluation of usability(e.g. eye-tracking + user interviews) • Acquisition of audio features, symbolic features, meta data and lyrics • Machine communication by using shared music ontologies (MPEG-7, RDF/S, DAML-S)
Prototype bilingual matching of phonetic ambiguities and misspellings treatment of refinements and negations extraction of musical concepts from natural language queries automatic generation of SQL queries on demand recognition of intention Intention-based result presentation
Software Development Lifecylce • System Design Philosophy: Google-Style • 1. Collection of User req. V1 • Offline • 20 germans, different user segments • 2. Setup of prototype V1 • Online Refinement of req. V1 -> Introduction of PhoneticMatch • 3. Collection of User req. V2 • Online with prototype V1 • 100 american native-speakers, internet-aware users • 4. Setup of prototype V2 • Bilingual phonetic match • NLP frontend • Audio-based music similarity • 5. Scaling of phonetic match component for commercial website
www.musicline.de Convenience ´s no.1 hit status que -> Status Quo golgen earing -> Golden Earring Fisher Set -> Fischer Z Novospaski Chor -> Novo Spassky Chor four none blondes -> 4 Non Blondes Matchbox twenty -> Matchbox 20 Statistics: 540.000 queries/month 400.000 queries for artists/month 80.000 fuzzy queries for artists/month
Multi-facet Music Similarity • Audio: MFCCs • Lyrics: TFIDF • Cultural: • Webcrawling • POS • TFIDF
Song Similarity: Audio-based Perception • Feature Extraction • Input Segment [30..60] sec • 30ms Hanning-Windows, Log Spectrum, Mel-Scale, Inverse Fourier Transform • 1000 vectors using the first 13 MFCCs • Representation • Intra-Song-Clustering -> Song Signature [Logan] • (Gaussian Mixture Models [Aucouturier]) • Similarity Measure • Euclidean Distance [Foote] • Kullback-Leibler Distance [Logan, Aucouturier] • (Approximative solutions: Sampling [Ellis, Aucouturier]) • DistMinMean [Ellis] • Earth Moving Distance (EMD) [Logan] • Different Features & Similarity Measures • [Welsh] Tonal histograms, tonal transition, volume, tempo, noise->Euclidean Distance • [Rauber&Frühwirth] Psychoacoustic Features -> Hierarchical SOM • [Pfeiffer] A review of MP3-native features • ...
Perception of similar Timbre in Songs: Evaluation?!?! • Audio Database: 700 MP3s of mainstream music at full-length, 40 artists, 70 different genres • Evaluation: no GT available! only anecdotal evidence or genre/artist/volume GT
Lyrics: Vector Space Model (TFIDF) • Representation of a Collection of Lyrics # of terms k: Song j: Occurence of term h in collection d(h): Weight of term j in song i: • Similarity metric
Song Similarity: Lyrics (1) Reference Song 112: Lucy pearl - Dance tonight.txt Most-relevant terms: toast spend tonight dance money 1. Similar Song : Lucy Pearl - you (feat. snoop dogg and Q-tipp).txt 2. Similar Song: Phil Collins - Please Come Out Tonight.txt 3. Similar Song: Madonna - Into the groove. Reference Song 56: Das Kind Vor Dem Euch.txt - die fantastischen vier Most-relevant terms: wollten euch sehn entsetzt selben 1. Similar Song: Die fantastischen Vier - Auf Der Flucht.txt 2. Similar Song: Freundeskreis - Mit Dir.txt Artist: 3. Similar Song: Die fantastischen Vier – Populär Reference Song 145: madonna - Paradise.txt Most-relevant terms: remains pas encore fois moi Zero Hits
Song Similarity: Lyrics (2) Reference Song 193: Phil Collins - One More Night.txt Most-relevant terms: forever wait night cos ooh 1. Similar Lyrics: Phil Collins - YOU CAN'T HURRY LOVE.txt 2. Similar Lyrics: Phil Collins - Inside Out.txt 3. Similar Lyrics: Phil Collins - This must be Love.txt Reference Song 297: Cat Stevens - Father And Son.txt Most-relevant terms: fault decision marry son settle 1. Similar Lyrics: Phil Collins - We're Sons Of Our Fathers.txt 2. Similar Lyrics: Sheryl Crow - No One Said It Would Be Easy.txt 3. Similar Lyrics: George Michael - Father Figure.txt
Visual Evaluation: Similarity (Cosine) HEAVYMETAL ROCK POP SOUL DANCE high ROCK POP SOUL DANCE low
Rel.Feedback (Rocchio) • subjective • context-dependent • „personal taste“ Experiment ? Clustering P2P=collabor. VectorSpaceModel Unsupervised Classification Supervised Learning? Similarity GroundTruth AMG=Experts Web Sources MusicSeer ? Part Of Speech + TermWeighting Evaluation ?![Downie, Uitdenbogerd] Listening mode WEKA Suite? Cosine vs. Learning Personal Classifier
Psychological Factors >>Musical Taste • Personality >> preferred Styles, Genres • Stability • Introversion / Extraversion • Aggressive / Passive • Socio-economics >> preferred Styles, Genres • Demographic >> similar users in CF approaches >> recos • Gender • Age • Situation • Mood >> tempo, tonality, beatness, pitch height • Listening Mode [Huron]
User Model [Chai,Vercoe] <habit> <context>I’m happy <tempo>very fast</tempo> <genre>pop</genre> </context> <pfeature>romantic <tempo>very slow</tempo> <softness>very soft<softness> <title>*love*</title> </pfeature> <context>bedtime <pfeature>romantic</pfeature> </context> </habit> <user> <generalbackground> <name>John White </name> <education>MS</education> <citizenship>US</citizenship> <birthdate>9/7/1974</birthdate> <sex>male</sex> <occupation>student</occupation> </generalbackground> </user> <musicpreferences> <genre>classical</genre> <genre>blues</genre> <genre>rock/pop</genre> <composer>Wolfgang Amadeus Mozart</composer> <artist>Beatles</artist> <sample> <title>Yesterday</title> <artist>Beatles</artist> </sample> </musicpreferences> <musicbackground> <education>none</education> <instrument>piano</instrument> <instrument>vocal</instrument> </musicbackground> <generalpreferences> <color>blue</color> <animal>dog</animal> </generalpreferences>
Multi-facet Music Similarity and Adaptive User Model • Hard-wired multi-facet similarity [Whitman] • Weighting of audio vs. cultural description by slider usage [Aucouturier] • Description Weight Vectors (DWV) [Rolland] • Original work for melodic similarity • DWV contains weight for each description in the representation • Weight is varying with user interaction • Explicit user feedback: re-ranking of system´s output • Implicit adaptation of weights • Future Work • Apply DWV to multi-facet similarity (audio,lyrics,cultural) • Infer initial setting of weights according to psychological factors
Project MPEER "In a world of spontaneously federating services, there is no point in having a proprietary service, there is no point in staying out of the directory, there is no point in using an XML protocol that no one understands, there is no point in basing it on a proprietary server, and there is no need to justify the obvious error in following that path." - Simon Phipps, chief technology evangelist, Sun Microsystems, Inc., 2001
“Bringing the web to its full potential” [Fensel, Bussler] Distributed / Dynamic Intelligent Web Services Web Services UDDI, WSDL, SOAP WFSL -> WSMF DAML-S Semantic Web WWW Centralized / Static RDF, RDF(S), DAML, OIL URI, HTML, HTTP Formal Semantic MPEER Objectives • Relate MIR to the Semantic Web activities (W3C) • Create (composite) Semantic Web Services for MIR • Explore the P2P computing paradigm (shared resources)
Classification Music Similarity Clustering Tempo, Loudness Timbre Meta Data (XML / RDF) Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarTo Meta Data (XML / RDF) Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarTo Audio(MP3) Audio(MP3) Audio(MP3) P2P Client/Server (Jtella/JXTA) P2P Client GUI Basic Features, Descriptors Semantic Web Wrapper MPEER Architecture „Title Artist Volume Genre Bpm Loud Sound Like Dislike SimilarTo ...“ • WebService • e.g. • Ontologies, Taxonomies • CD-Retailers, EMD • MIR services • Audio ID • Thumbnails • ... User Meta Data (XML / MPEG-7 / RDF-S) Title,Artist,Volume,Genre,bpm,Loudness,Timbre,Like,Dislike,SimilarTo
MPEER: composite Webservice • Service Type: „query service“ • Sub Type: Semantic web enabled • Domain: Music • Supported ontologies: {ontoson, musicbrainz.com, allmusicguide, ..} • Port Types: • Identification by audio, Similarity by audio, Retrieval by partial information • Personalized recommendations, Playlist generation • Music-Question Answering • Operations/Messages of Port Type Identification by audio: • IF_NOT_MP3(input)->Convert2MP3(input)->CalculateMetadata-> ... • Composite, Distributed Services: (maybe P2P using users local content&processing power) • (1) MPeer.getEverythingFrom(Prince) • (2) WebServiceRepository.discover&select(SpecialArtistService) • (3) SpecialArtistService=AllMusicGuide.detailedInfo • (4) NegotiateContract(contract1,MPeer,AllMusicGuide) • (5) Contract1.StartTransaction(MPeer,AllMusicGuide) • (5.1) AllMusicGuide.detailedinfo(Prince) • (5.2) ...
Conclusion • The Web offers potential beyond symbolic or audio-based MIR reflecting cultural issues • User-centric MIR systems may benefit from user models and situation-driven adaptation • The field is too large to be handled by individual institutes • Composite web services offer a way for collaboration on the topic and maybe to provide holistic, high-quality MIR systems