300 likes | 430 Views
Personalisation and Recommendations using Drupal. Keywords: Personalisation Recommendations Scalable machine learning Predictions Similarity Data Mining Big Data Trend Spotting Clustering. Kendra Initiative mission Foster an Open Distributed Marketplace for Digital Media EU funded
E N D
Personalisation and Recommendations using Drupal • Keywords: • Personalisation • Recommendations • Scalable machine learning • Predictions • Similarity • Data Mining • Big Data • Trend Spotting • Clustering Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Kendra Initiative mission • Foster an Open Distributed Marketplace for Digital Media • EU funded • P2P-Next • http://www.p2p-next.org • SARACEN (Socially Aware, collaboRative, scAlable Coding mEdiadistributioN) • http://www.saracen-p2p.eu Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Deliverables • Kendra Signpost • Metadata interoperability, mapping and transformation • Smart Filters • Portable preferences and filters • Kendra Social, Kendra Hub • Social networking management tools • Standards work • OpenSocial extension • Social API – see Abstracting Social Networking functionality in Drupal sprint • Kendra Match • Searching and recommendation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Components • Drupal Recommender API module • Recommender helper modules • async_command module • Apache Mahout or cloud service • Hadoop cluster (optional) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Industry Examples • Amazon • Netflix • Spotify, Pandora • Facebook, LinkedIn • OKCupid • iTunes: Genius; app store - not so much Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Machine learning • Collaborative Filtering • AKA recommender engines • Clustering • Classification Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Collaborative Filtering • Input: preference data • Output: predictions • Preference = <uid1, (nid1 or uid2), w1> • w1 = signed integer representing weight of uid1-nid1 or uid1-uid2 correlation (affinity) • Prediction = <uid1, (nid1or uid2), w2> • w2 = float representing strength of uid1-nid1 or uid1-uid2 correlation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Enter Mahout • Apache Mahout is a scalable machine learning library that supports large data sets. • Launched Spring 2010 • Grew from the Apache Lucene project (basis for Apache Solr) • Merged with Taste project Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Use Cases • Recommendation mining • Clustering • Classification • Frequent itemset mining Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Out-of-box algorithms • Recommendation • User-based recommender • Item-based recommender • Slope-One recommender • Distributed Item-Based Collaborative Filtering • Collaborative Filtering using parallel matrix factorisation • Clustering • Canopy Clustering • K-Means Clustering • Fuzzy K-Means • Mean Shift Clustering • Dirichlet Process Clustering • Latent Dirichlet Allocation • Spectral Clustering • Minhash Clustering • Model combination • Naive Bayes algorithm Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Hadoop • Provides clustering capabilities • Not trivial to set up • Not yet implemented in Recommender API (issue #1206840) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Recommender API • Drupal 7 (alpha) & 6 (beta) • Can run either on same server as Apache web server or on a remote server • Java helper program (was PHP) • Uses JDBC and Java Persistence API (JPA) • Drupal helper modules Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Recommender API helper modules • Browsing History Recommender • OG Similar groups module • Ubercart Products Recommender • Fivestar Recommender • Points Voting Recommender • Flag Recommender Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Asynchronous operation • Async_command module • Talks to Mahout • Typically run via cron • Results are stored directly in Drupal db • Recommender tables • Via JDBC Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Hosting Solutions • Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons • Recommender API Cloud Service - looking for beta testers • Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Installing Mahout • Prerequisites: • Dedicated VM if possible • Linux, Mac OSX Leopard 10.5.6 or later, Windows (Cygwin) • Java JDK 1.6 • Maven 2.0.11 or higher (maven.apache.org) Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Installing Mahout • Building • Follow instructions • https://cwiki.apache.org/MAHOUT/buildingmahout.html • Use maven to build examples Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Installing Mahout • Testing: Grouplens • On a single 2GHz server: • 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit • Using 6 concurrent 2GHz processing units: • 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Installing Recommender API • See http://drupal.org/node/1207634 • Configuration • sites/all/modules/async_command/config.properties should match settings.php • Download and enable async_command • Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Usage • Making recommendations • User-user • User-item • Item-item • Predictions/similarity feeds back into Drupal • Blocks • Views Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Case study: Data Mining and Recommendations in SARACEN • SARACEN: http://www.saracen-p2p.eu/ • Feedback loop to measure subjective quality of the recommendations • Limited set of data, small user base • API provides an initial set of recommended videos • User can then watch a recommended video • User’s actions are incorporated into their implicit profile, feeds back to the recommender API • Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
SARACEN: Prototype Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Recommender data sources • Explicit data • SARACEN account data, including location and language • Linked accounts and profiles • e.g. Facebook user profile, “likes”, connections, metadata • Implicit data • Activity history recorded during the user’s sessions • Searches • Shared content • Viewed content • Albums (media containers) • Content ratings Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Scalability • Don’t need Hadoop if • Number of users is orders of magnitude larger than the number of items • Users browse anonymously most of the time • Few users log in and need personalised recommendations • Item churn rate is relatively low Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Worth Considering • Decreased Transparency • Decreased Serendipity • Sleep deprivation Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Resources: Recommender API • http://drupal.org/project/recommender • http://recommenderapi.com/cloud • https://cwiki.apache.org/confluence/display/MAHOUT Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Resources: Mahout • http://mahout.apache.org/ • Mahout in Action • http://www.manning.com/owen/ • ISBN 9781935182689. • The Optimality of Naive Bayes, Harry Zhang. • http://aws.amazon.com/elasticmapreduce/ Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Acknowledgements • Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) • http://www.saracen-p2p.eu • Funded within the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 248474 Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Questions? • Kendra Initiative • @kendra • http://www.kendra.org.uk • https://github.com/kendrainitiative • Klokie Grossfeld • @klokie • klokie@kendra.org.uk • http://www.linkedin.com/in/klokie • Daniel Harris • @dahacouk • daniel@kendra.org.uk • http://www.linkedin.com/in/dahacouk Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16
Thanks http://barcelona2012.drupaldays.org/abstracting-social-networking-functionality-drupal Drupal Developer Days Barcelona – Kendra Initiative 2012.06.16