210 likes | 367 Views
Music Recommendation A Data Mining Approach. Daniel McEnnis 2nd year PhD. Overview. High level overview Toolkit Improvements Experiments Evaluation Algorithms research Data Future work. Project Goals. Integrate social information Make algorithms ‘culturally aware’
E N D
Music RecommendationA Data Mining Approach Daniel McEnnis 2nd year PhD
Overview • High level overview • Toolkit Improvements • Experiments • Evaluation • Algorithms research • Data • Future work
Project Goals • Integrate social information • Make algorithms ‘culturally aware’ • Implement existing algorithms • Systematic evaluation framework
Similarity Algorithms • Create new relations based on some aspect of similarity • 6 different varieties of similarity • Each algorithm can use one of 6 distance functions
Aggregator Algorithms • Takes data from one set of actors and moves it to another • 6 different varierties • Each variety uses one of 7 aggregator functions • Basic building block of Graph-RAT applications
Graph Triples Census • Probable novel algorithm • Proof of Correctness Completed • Proof of Time Complexity Completed • Literature review in progress
SUCCESS! • Graph-RAT programming language now functioning • Graph-RAT integrates social, cultural, personal, and audio data into algorithms • Includes most commercial algorithms • Contains primitives for existing academic systems • Evaluation is entirely automated
Evaluation Exploration • 9 types of music recommendation • Personalized versus generic • Open query versus targeted query • Dynamic versus static data • New music versus all music
Personalized Radio • Open query with personalized presentation • Static data vs dynamic data • New items prediction vs predict anything
Targeted Search • Not personalized • Similarity queries • Automatically generating targeted lists for a browsing hierarchy • New music vs all music • Static vs dynamic data
Personalized Tag Radio • Create a personalized play list matching a given query • New music vs all music • Static vs dynamic data
Excluded Types • ‘Top 40’ prediction • Rendered obsolete by other types
Existing Algorithms • Item-to-Item collaborative filtering • 7 variations • User-to-user collaborative filtering • 7 variations • Associative mining collaborative filtering • Direct machine learning playlist data • Direct machine learning audio data
Novel Algorithms • Machine learning over profile data • Machine learning over cultural and profile data • Machine learning on different concatenations • Audio • Playlist • Profile • Cultural
Initial Data • LiveJournal • Separating music data is difficult • No tag info or audio content • No enough musical data • LastFM by User • No audio content • Data cleaning is an issue
Current Data • 40’s Jazz Recordings • 1800 annotated recordings from 70 CDs • Covers nearly all 40’s popular music • LastFM by Song • Retrieves tag and user info by song • Data cleaning on user playcounts needed
Data Cleaning Tags • Polysemy • Synonomy • Disjoint • Hypersomny • Hyposomny • Initial algorithms developed
Future Work: Programming • Radically different programming environment • SQL • LINQ library package in C#
Future Work: Scalability • Distributed SQL database implementation • Just-in-time compilation • Event-based recalculation of algorithm results • Parallel execution of algorithms • Multi-threaded algorithms