300 likes | 417 Views
Progress Report - Year 2. Extensions of the PhD Symposium Presentation Daniel McEnnis. Overview. Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements. Current Data. 40’s Jazz Recordings 2000 annotated recordings from 80 CDs
E N D
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis
Overview • Accomplishments • Data set acquisition and cleaning • Theoretical achievements • Graph-RAT improvements
Current Data • 40’s Jazz Recordings • 2000 annotated recordings from 80 CDs • Covers nearly all 40’s popular music • LastFM by Song • Retrieves tag and user info by song • Data cleaning on user playcounts needed
Planned Data Set Acquisition • Explored DBTunes XML version of myspace. • Linking with LastFM data designed but not yet written. • Provides per-artist audio data for all recent artists.
Theoretical Achievements • Algorithm Literature Review • Theortical Computer Science journal submission • NZCSRSC conference submission • Recommendation Tasks and Evaluation Metrics
Algorithm Literature • Systematic exploration of theoretical computer science and discrete mathematics. • Discovered 1973 SIAM paper for maximal clique algorithm. • Maximal clique algorithm is most efficient discovered
Journal Submission • Submitted Graph Triples Census algorithm. • Proof of correctness • Proof of Time complexity • Proof of Space Complexity • Rediscovery of 2001 algorithm in Social Networks • Most efficient implementation known
NZCSRSC • Poster at the conference • Written as a short users guide
Evaluation Exploration • Incorporating cross-validation into relational data. • 9 types of music recommendation • Personalized versus generic • Open query versus targeted query • Dynamic versus static data • New music versus all music
Personalized Radio • Open query with personalized presentation • Static data vs dynamic data • New items prediction vs predict anything
Targeted Search • Not personalized • Similarity queries • Automatically generating targeted lists for a browsing hierarchy • New music vs all music • Static vs dynamic data
Personalized Tag Radio • Create a personalized play list matching a given query • New music vs all music • Static vs dynamic data
Excluded Types • ‘Top 40’ prediction • Rendered obsolete by other types
Cross-Validation in Graphs • Actor removal • Only form currently used • All links to a particular actor are removed • Link removal • Selected links from ground truth are removed • Algorithm evaluated on reproducing missing links
Graph-RAT Improvements • Release of 0.4.4 • Finalized Graph-RAT as a relational programming language • Added propositional algorithms • Release of 0.5.0 • New Query Subsystem • Usability enhancements • Space complexity improvements
Aggregators • 8 algorithms with 9 helper functions • Cover each form of propositionalization • Cover mappings between links and properties • Core primitives for Graph-RAT as a programming language.
Similarity • 2 new similarity algorithms • 1 new distance metric
Query Subsystem • 28 primitives for searching in a graph • 10 graph primitives • 7 actor primitives • 7 link primitives • 4 property primitives • Functional - composition to build queries
Performance Specs • Queries can return collections or iterators. • Collections • Implemented as references into graphs • Linear in number of references • Iterators • Ordered sequences of objects • Constant in space complexity (excluding Graph ID and AllGraphs)
Usability Enhancements • Properties and Metadata • Interface enhancements • Dynamic Loading of Classes • XML scripting support
Properties and Metadata • Properties description • Encapsulates all parameter code • Utilizes Graph-RAT Property objects • Comparison to JavaBeans • New Metadata Model • Parameter model update • Input/Output descriptors update
Interface Updates • Arrays->Lists • graph, link, actor, and property objects • Iterators • All graph operations support iterators
Dynamic Loading • Classes loaded from file at runtime. • Loading controlled by call to loader object • Automatic registering with relevant factories • All factories updated to support dynamic loading • Extend Abstract Factory
XML Scripting support • SAX parser support for all components excepting crawling and parsing • Implemented using the Builder pattern
Core Improvements • 2 cross-validation algorithms • ~20 algorithm with space complexity improvements • Iterators for all graph primitives • Macros for separation of graph data by cross-validation property.
Additional algorithms • 2 new similarity algorithms • 1 new distance metric added • Obsolete algorithms removed
LastFM crawler updates • LastFM upgraded its web-services, removing the old version • New version will link to the semantic web • ~20 parsers completed • Still under construction
Planned Future Work • Contingent on arrival of computer • Testing of existing code • Cross-Validation Scheduler • Completion of LastFM Parser • DBTunes (from semantic web) parser • Experiments! • Write Thesis!
Unplanned Future Work • Full semantic web crawler • Incorporating GData protocols • Database backend • Colt-Matrix-Over-Graph adapter • Database-backed Weka instance
Beyond the Horizon • Support for Prolog primitives • Multi-database graph support • Semantic Web graph utilizing the proxy pattern • Support for dynamic updates and dynamic data