convenient MIR systems

convenient MIR systems vision vs. reality check, research & e-commerce Stephan Baumann

Agenda • Personal Profile • Convenient Music Information Retrieval • Multi-modal queries • Identification by description • Multi-facet music similarity • Timbre • Lyrics • Cultural aspects • Project MPEER: P2P, semantic web and MIR

Research Diary (1991-2003) • 1991/92 optical music recognition • 1992/93 online handwriting recognition • 1993/94 optical music recognition • 1995/97 document analysis and understanding • 1996 first look on MultimediaIR (S.Pfeiffer) • 1998/99 spinoff activities with Insiders GmbH • 2000 freelancing/research for draft MIR system • 2001 co-founding Sonicson GmbH • 2001/03 subjective music similarity (Ph.D.. Sep03)

Desiderata MIR [Huron] • 1. Access to all of the world’s music • 2. Access via an indexing method • 3. Fair use (reimbursement to all contributors) • 4. Open system • 5. Self-correcting system • 6. Ensurement of privacy and cultural practices

MIR Categorization [Futrelle]

Related Work • Audio: • [Blum, Wold], [Pfeiffer], [Foote], [Logan], ... • [Scheirer], [Tzanetakis], [Welsh], [Aucouturier], [Peeters], ... • Cultural: • [Whitman], [Pachet], [Ellis, Berenzweig] • Multi-modal MIR • [Bainbridge], ... • Recommendation • [Amazon, Moodlogic, MusicGenome, MuBu, MongoMusic], ... • [Uitdenbogerd] • User Models • [Chai, Vercoe], [Rolland] • Music Psychology • [Bruhn, Rösing], [Gabriellson, Västfjäll], ... • Usability, Convenience • [Shneiderman], [Nielson], ...

Convenience • Using natural language as input for queries of non-musicians • Accessing meta data, symbolic and audio layers in one interface • Evaluation of usability(e.g. eye-tracking + user interviews) • Acquisition of audio features, symbolic features, meta data and lyrics • Machine communication by using shared music ontologies (MPEG-7, RDF/S, DAML-S)

Prototype bilingual matching of phonetic ambiguities and misspellings treatment of refinements and negations extraction of musical concepts from natural language queries automatic generation of SQL queries on demand recognition of intention Intention-based result presentation

Software Development Lifecylce • System Design Philosophy: Google-Style • 1. Collection of User req. V1 • Offline • 20 germans, different user segments • 2. Setup of prototype V1 • Online Refinement of req. V1 -> Introduction of PhoneticMatch • 3. Collection of User req. V2 • Online with prototype V1 • 100 american native-speakers, internet-aware users • 4. Setup of prototype V2 • Bilingual phonetic match • NLP frontend • Audio-based music similarity • 5. Scaling of phonetic match component for commercial website

www.musicline.de Convenience ´s no.1 hit status que -> Status Quo golgen earing -> Golden Earring Fisher Set -> Fischer Z Novospaski Chor -> Novo Spassky Chor four none blondes -> 4 Non Blondes Matchbox twenty -> Matchbox 20 Statistics: 540.000 queries/month  400.000 queries for artists/month  80.000 fuzzy queries for artists/month

Usability Evaluation: helping text

Multi-facet Music Similarity • Audio: MFCCs • Lyrics: TFIDF • Cultural: • Webcrawling • POS • TFIDF

Song Similarity: Audio-based Perception • Feature Extraction • Input Segment [30..60] sec • 30ms Hanning-Windows, Log Spectrum, Mel-Scale, Inverse Fourier Transform • 1000 vectors using the first 13 MFCCs • Representation • Intra-Song-Clustering -> Song Signature [Logan] • (Gaussian Mixture Models [Aucouturier]) • Similarity Measure • Euclidean Distance [Foote] • Kullback-Leibler Distance [Logan, Aucouturier] • (Approximative solutions: Sampling [Ellis, Aucouturier]) • DistMinMean [Ellis] • Earth Moving Distance (EMD) [Logan] • Different Features & Similarity Measures • [Welsh] Tonal histograms, tonal transition, volume, tempo, noise->Euclidean Distance • [Rauber&Frühwirth] Psychoacoustic Features -> Hierarchical SOM • [Pfeiffer] A review of MP3-native features • ...

Perception of similar Timbre in Songs: Evaluation?!?! • Audio Database: 700 MP3s of mainstream music at full-length, 40 artists, 70 different genres • Evaluation: no GT available! only anecdotal evidence or genre/artist/volume GT

Lyrics: Vector Space Model (TFIDF) • Representation of a Collection of Lyrics # of terms k: Song j: Occurence of term h in collection d(h): Weight of term j in song i: • Similarity metric

Song Similarity: Lyrics (1) Reference Song 112: Lucy pearl - Dance tonight.txt Most-relevant terms: toast spend tonight dance money 1. Similar Song : Lucy Pearl - you (feat. snoop dogg and Q-tipp).txt 2. Similar Song: Phil Collins - Please Come Out Tonight.txt 3. Similar Song: Madonna - Into the groove. Reference Song 56: Das Kind Vor Dem Euch.txt - die fantastischen vier Most-relevant terms: wollten euch sehn entsetzt selben 1. Similar Song: Die fantastischen Vier - Auf Der Flucht.txt 2. Similar Song: Freundeskreis - Mit Dir.txt Artist: 3. Similar Song: Die fantastischen Vier – Populär Reference Song 145: madonna - Paradise.txt Most-relevant terms: remains pas encore fois moi Zero Hits

Song Similarity: Lyrics (2) Reference Song 193: Phil Collins - One More Night.txt Most-relevant terms: forever wait night cos ooh 1. Similar Lyrics: Phil Collins - YOU CAN'T HURRY LOVE.txt 2. Similar Lyrics: Phil Collins - Inside Out.txt 3. Similar Lyrics: Phil Collins - This must be Love.txt Reference Song 297: Cat Stevens - Father And Son.txt Most-relevant terms: fault decision marry son settle 1. Similar Lyrics: Phil Collins - We're Sons Of Our Fathers.txt 2. Similar Lyrics: Sheryl Crow - No One Said It Would Be Easy.txt 3. Similar Lyrics: George Michael - Father Figure.txt

Artist Similarity: Cultural Aspects

Web Crawling+PartOfSpeech+TFIDF

Visual Evaluation: Similarity (Cosine) HEAVYMETAL ROCK POP SOUL DANCE high ROCK POP SOUL DANCE low

Recall/Precision against P2P, AMG data A Ra R

Rel.Feedback (Rocchio) • subjective • context-dependent • „personal taste“ Experiment ? Clustering P2P=collabor. VectorSpaceModel Unsupervised Classification Supervised Learning? Similarity GroundTruth AMG=Experts Web Sources MusicSeer ? Part Of Speech + TermWeighting Evaluation ?![Downie, Uitdenbogerd] Listening mode WEKA Suite? Cosine vs. Learning Personal Classifier

Psychological Factors >>Musical Taste • Personality >> preferred Styles, Genres • Stability • Introversion / Extraversion • Aggressive / Passive • Socio-economics >> preferred Styles, Genres • Demographic >> similar users in CF approaches >> recos • Gender • Age • Situation • Mood >> tempo, tonality, beatness, pitch height • Listening Mode [Huron]

User Model [Chai,Vercoe] <habit> <context>I’m happy <tempo>very fast</tempo> <genre>pop</genre> </context> <pfeature>romantic <tempo>very slow</tempo> <softness>very soft<softness> <title>*love*</title> </pfeature> <context>bedtime <pfeature>romantic</pfeature> </context> </habit> <user> <generalbackground> <name>John White </name> <education>MS</education> <citizenship>US</citizenship> <birthdate>9/7/1974</birthdate> <sex>male</sex> <occupation>student</occupation> </generalbackground> </user> <musicpreferences> <genre>classical</genre> <genre>blues</genre> <genre>rock/pop</genre> <composer>Wolfgang Amadeus Mozart</composer> <artist>Beatles</artist> <sample> <title>Yesterday</title> <artist>Beatles</artist> </sample> </musicpreferences> <musicbackground> <education>none</education> <instrument>piano</instrument> <instrument>vocal</instrument> </musicbackground> <generalpreferences> <color>blue</color> <animal>dog</animal> </generalpreferences>

Multi-facet Music Similarity and Adaptive User Model • Hard-wired multi-facet similarity [Whitman] • Weighting of audio vs. cultural description by slider usage [Aucouturier] • Description Weight Vectors (DWV) [Rolland] • Original work for melodic similarity • DWV contains weight for each description in the representation • Weight is varying with user interaction • Explicit user feedback: re-ranking of system´s output • Implicit adaptation of weights • Future Work • Apply DWV to multi-facet similarity (audio,lyrics,cultural) • Infer initial setting of weights according to psychological factors

Project MPEER "In a world of spontaneously federating services, there is no point in having a proprietary service, there is no point in staying out of the directory, there is no point in using an XML protocol that no one understands, there is no point in basing it on a proprietary server, and there is no need to justify the obvious error in following that path." - Simon Phipps, chief technology evangelist, Sun Microsystems, Inc., 2001

“Bringing the web to its full potential” [Fensel, Bussler] Distributed / Dynamic Intelligent Web Services Web Services UDDI, WSDL, SOAP WFSL -> WSMF DAML-S Semantic Web WWW Centralized / Static RDF, RDF(S), DAML, OIL URI, HTML, HTTP Formal Semantic MPEER Objectives • Relate MIR to the Semantic Web activities (W3C) • Create (composite) Semantic Web Services for MIR • Explore the P2P computing paradigm (shared resources)

Classification Music Similarity Clustering Tempo, Loudness Timbre Meta Data (XML / RDF) Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarTo Meta Data (XML / RDF) Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarTo Audio(MP3) Audio(MP3) Audio(MP3) P2P Client/Server (Jtella/JXTA) P2P Client GUI Basic Features, Descriptors Semantic Web Wrapper MPEER Architecture „Title Artist Volume Genre Bpm Loud Sound Like Dislike SimilarTo ...“ • WebService • e.g. • Ontologies, Taxonomies • CD-Retailers, EMD • MIR services • Audio ID • Thumbnails • ... User Meta Data (XML / MPEG-7 / RDF-S) Title,Artist,Volume,Genre,bpm,Loudness,Timbre,Like,Dislike,SimilarTo

MPEER: composite Webservice • Service Type: „query service“ • Sub Type: Semantic web enabled • Domain: Music • Supported ontologies: {ontoson, musicbrainz.com, allmusicguide, ..} • Port Types: • Identification by audio, Similarity by audio, Retrieval by partial information • Personalized recommendations, Playlist generation • Music-Question Answering • Operations/Messages of Port Type Identification by audio: • IF_NOT_MP3(input)->Convert2MP3(input)->CalculateMetadata-> ... • Composite, Distributed Services: (maybe P2P using users local content&processing power) • (1) MPeer.getEverythingFrom(Prince) • (2) WebServiceRepository.discover&select(SpecialArtistService) • (3) SpecialArtistService=AllMusicGuide.detailedInfo • (4) NegotiateContract(contract1,MPeer,AllMusicGuide) • (5) Contract1.StartTransaction(MPeer,AllMusicGuide) • (5.1) AllMusicGuide.detailedinfo(Prince) • (5.2) ...

Prototypical P2P Client

OpenSource Tools: Ontology Editor

OpenSource Tools: DataMining, ML

Conclusion • The Web offers potential beyond symbolic or audio-based MIR reflecting cultural issues • User-centric MIR systems may benefit from user models and situation-driven adaptation • The field is too large to be handled by individual institutes • Composite web services offer a way for collaboration on the topic and maybe to provide holistic, high-quality MIR systems

convenient MIR systems

convenient MIR systems

Presentation Transcript

Convenient

Mir Space Station

A Mir Kiss

MIR: Now

TAO-MIR

MIR OZONE

miR-124

miR-16

Convenient Location

miR-34a

Convenient

bmo-miR-1

miR-9