200 likes | 340 Views
Combining Content-based and Collaborative Filtering. Gabriela Polčicová Pavol Návrat. Department of Computer Science and Engineering, Slovak University of Technology polcicova @dcs.elf.stuba.sk navrat @elf.stuba.sk. Overview. Information Filtering and its Types Combined Method
E N D
Combining Content-based and Collaborative Filtering Gabriela Polčicová Pavol Návrat Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk navrat@elf.stuba.sk
Overview • Information Filtering and its Types • Combined Method • Experiment with Information Filtering Methods • Conclusions
InformationFiltering (1) • delivery of relevant information to the people who need it • Types of Information Filtering • Content-based - for textual documents • Collaborative - for communities of users • Interests • information about interests - stored in profiles • expressing opinions to documents - ratings • Ratings {i, j, rij} • for user i, item j, the value of rating rij
Information Filtering (2) Filter Rated items {user, item, value} Learning interests Unrated items {user, item} Estimating the value of rating Recommendations {user, item, estimation} Choosing recommendations
Content-based Filtering (1) • Basic idea • recommending documents based on content and properties of document • Profile • consists of keywords with assigned weights • only documents matching profile are recommended • Recommendations • based on objective measurable properties
Content-based Filtering (2) Documents rated by the user Documents unrated by the user Documents of interest Documents, ratings PROFILE Keywords, phrases with weights Documents matching profile => recommended documents
Collaborative Filtering (1) • Basic idea • automating “word of mouth” • leverage opinions of like-minded users while making decisions • Schema • collecting users’ opinions • searching for like-minded users • making recommendations
Collaborative Filtering (2) Profile of user 1 Profile of user 2 Profile of current user Profile of user 3 Profile of user 4 Documents from like-minded users’ profiles => recommended documents Profile of user 5
Collaborative Filtering (3) • Similarity measure: Pearson Correlation Coefficient (rcj - rc) (rij - ri) j Ici kci = (rcj - rc)2 (rij - ri)2 j Ici j Ici • Recommendations computation: weighted sum of ratings (rij - ri) kci i Ucj rcj = rc + |kci| i Ucj
Combining Content-based and Collaborative Filtering (1) • Computing of estimates for missing ratings by Content-based Filtering method for each user • Searching for like-minded users • computing coefficient kci between current and i-th user (only from ratings) • computing coefficient kci’ between current and i-th user (from both ratings and estimates) • New recommendations computation • using ratings (with coefficients kci) and also ratings with estimates (with coefficient kci’) as weights in weighted sum of ratings and estimates
Datasets for Experiments • Data: • EachMovie - users‘ ratings for movies www.research.digital.com/SRC/eachmovie/ • IMDB - textual information for CBF (movies‘ descriptions) www.imdb.com/ • Datasets: • A - ratings from the period up to Mar 1, 1996 (810 ratings from 71 users) • B - ratings from the period uo to Mar 15, 1996 (2407 ratings from 131 users) • C - ratings from the period up to Apr 1, 1996 (12290 ratings from 651 users)
EachMovie Data and Constant Method • Constant Method rcj = 5
Experiments with Combination of Content-based and Collaborative Filtering (2) Dataset Content-based Filtering method recommendations test, training sets Collaborative Filtering method Apply filtering methods and evaluate their performance recommendations Divide dataset into training set (90%) and test set (10%) test, training sets Combined Filtering method recommendations test set Constant method recommendations Evaluation of methods’ performance
Metrics • Coverage = percentage of items for which the method is able to compute estimates • Accuracy = • F-measure = • NMAE = |R L| + |R L| |L| + |L| R - set of recommended items L - set of liked items 2.Precision.Recall Precision + Recall |R L| |R| Precision = Recall = |rij - rij| n.s |R L| |L|
Conclusions • Combination of content-based and collaborative filtering might help in initial phase Future work • Weighting of coefficients • Comparing method with additional methods
Content-based Filtering - Vector Representation of Documents and Profiles Documentj computer machine learning W . Profile Sim(W, Profile) = |W| . |Profile| n profilei = rj .wij j = 1 TF-IDF TF-IDF TF-IDF Wj= (0, … , 0, 0.5 , 0, … , 0, 0.3 , 0, … , 0, 0.2 , 0, … , 0) D = ( … , computer, … , learning, … , machine, …. )
Collaborative Filtering - Example A B C D E F G current1 4 5 1 3 5 1 2 21 3 2 5 3 5 1 4 5 4 1 4 2 4 5 2 4 2 5 2
Combining Content-based and Collaborative Filtering (2) • Similarity measure: Pearson Correlation Coefficient (rcj - rc) (rij - ri) CBF CBF ’ j Ici ’ kci = (rcj - rc)2 (rij - ri)2 CBF CBF ’ ’ j Ici j Ici • Recommendations computation: weighted sum of ratings and estimates (rij - ri) kci+ (rij - ri) kci’ CBF i Ucj i U’cj rcj = rc + |kci| + |kci’| i Ucj i U’cj
Experiments with Combination of Content-based and Collaborative Filtering (1) • Content-based Filtering Method (CBF) • documents and profiles: vector representation - weighted keywords (TF-IDF) • estimation computation: normalized dot product of document and profile vectors • Collaborative Filtering (CF) • Pearson correlation coefficient • weighted sum of ratings • Combination of CF and CBF • Pearson correlation coefficients • weighted sum of ratings and CBF estimations • Constant Method (rcj = 5)