Collaborative Filtering

Collaborative Filtering Rong Jin Dept. of Computer Science and Engineering Michigan State University

Information Filtering • Basic filtering question: Will user U like item X? • Two different ways of answering it • Look at what U likes  characterize Xcontent-based filtering • Look at who likes X  characterize Ucollaborative filtering

Collaborative Filtering(Resnick et al., 1994) Make recommendation decisions for a specific user based on the judgments of users with similar interests.

A General Strategy(Resnick et al., 1994) • Identify the training users that share similar interests as the test user. • Predict the ratings of the test user as the average of ratings by the training users with similar interests

A General Strategy(Resnick et al., 1994) • Identify the training users that share similar interests as the test user. • Predict the ratings of the test user as the average of ratings by the training users with similar interests 5

Important Problems in Collaborative Filtering • How to estimate users’ similarity if rating data is sparse? • Most users only rate a few items • How to identify interests of a test user if he/she only provides ratings for a few items? • Most users are inpatient to rate many items • How to combine collaborative filtering with content filtering? • For movie ratings, both the content information and the user ratings are available

Problem I: How to Estimate Users’ Similarity based on Sparse Rating Data?

Sparse Data Problem(Breese et al., 1998) Most users only rate a small number of items and leave most items unrated

Flexible Mixture Model (FMM) (Si & Jin, 2003) • Cluster training users of similar interests

Flexible Mixture Model (FMM) (Si & Jin, 2003) • Cluster training users of similar interests • Cluster items with similar ratings

Movie Type I Flexible Mixture Model (FMM) (Si & Jin, 2003) Movie Type II Movie Type III • Unknown ratings are gone!

Movie Type I Flexible Mixture Model (FMM) (Si & Jin, 2003) Movie Type II Movie Type III • Introduce rating uncertainty • Unknown ratings are gone! • Cluster both users and items simultaneously

Zu: user class Zo: item class U: user O: item R: rating Cluster variable Observed variable Zu Zo U O R Flexible Mixture Model (FMM) (Si & Jin, 2003) An Expectation Maximization (EM) algorithm can be used for identifying clustering structure for both users and items

Rating Variance (Jin et al., 2003a) • The Flexible Mixture Model is based on the assumption that users of similar interests will have similar ratings for the same items • But, different users of similar interests may have different rating habits

Decoupling Model (DM)(Jin et al., 2003b) Zu: user class Zo: item class U: user O: item R: rating Zo Zu R U O Hidden variable Observed variable

Decoupling Model (DM) (Jin et al., 2003b) Zu: user class Zo: item class U: user O: item R: rating Zo Zu Zpref Zpref: whether users like items R U O Hidden variable Observed variable

Decoupling Model (DM) (Jin et al., 2003b) Zu: user class Zo: item class U: user O: item R: rating ZR Zo Zu Zpref Zpref: whether users like items ZR: rating class R U O Hidden variable Observed variable

Empirical Studies • EachMovie dataset: • 2000 users and 1682 movie items • Avg. # of rated items per user is 130 • Rating range: 0-5 • Evaluation protocol • 400 training users, and 1600 testing users • Numbers of items rated by a test user: 5, 10, 20 • Evaluation metric: MAE • MAE: mean absolute error between true ratings and predicted ratings • The smaller the MAE, the better the performance

Baseline Approaches • Ignore unknown ratings • Vector similarity (Breese et al., 1998) • Fill out unknown ratings for individual users with their average ratings • Personal diagnosis (Pennock et al., 2000) • Pearson correlation coefficient (Resnick et al., 1994) • Only cluster users • Aspect model (Hofman & Puzicha, 1999)

Experimental Results

Summary • The sparse data problem is important to collaborative filtering • Flexible Mixture Model (FMM) is effective • Cluster both users and items simultaneously • Decoupling Model (DM) provides additional improvement for collaborative filtering • Take into account rating variance among users of similar interests

Problem II:How to Identify Users’ Interests based on A Few Rated Items?

Identify Users’ Interests • To identify the interests of a user, the system needs to ask the user to rate a few items • Given a user is only willing to rate a few items, which one should be asked to solicit rating?

Active Learning Approaches(Ross & Zemel, 2002) • Selective sampling • Ask a user to rate the items that are most distinguishable for users’ interests • A general strategy • Define a loss function that represents the uncertainty in determining users’ interests • Choose the item whose rating will result in the largest reduction in the loss function

Active Learning Approach (I)(Jin & Si, 2004) • Select the items that have the largest variance in the ratings by the most similar users

Active Learning Approach (II) (Jin & Si, 2004) • Consider all the training users when selecting items • Weight training users by their similarities when computing the “uncertainty” of items

A Bayesian Approach for Active Learning (Jin & Si, 2004) • Flexible Mixture Model • Key is to determine the user class for a test user • Let D be the ratings already provided by test user y • D = {(x1, r1), …, (xk, rk)} • Let  be the distribution of user class for test user y estimated based on D •  = {z = p(z|y)|1z m}

A Bayesian Approach for Active Learning (Jin & Si, 2004) • When the user class distribution true of the test user is given, we will select the item x* that

A Bayesian Approach for Active Learning (Jin & Si, 2004) • When the user class distribution true of the test user is given, we will select the item x* that • x,r be the distribution of user class for test user y estimated based on D + (x,r)

A Bayesian Approach for Active Learning (Jin & Si, 2004) • When the user class distribution true of the test user is given, we will select the item x* that • x,r be the distribution of user class for test user y estimated based on D + (x,r) • Take into account the uncertainty in rating prediction

Two types of uncertainties • Uncertainty in user class distribution  • Uncertainty in rating prediction A Bayesian Approach for Active Learning (Jin & Si, 2004) • But, in reality, we never know the true user class distribution trueof the test user • Replace true with the distribution p(|D)

Computational Issues • Estimating p(|D) is computationally expensive • Calculating the expectation is also expensive

Approximate Posterior Distribution (Jin & Si, 2004) • Approximate p(|D) by Laplacian approximation • Expand the log-likelihood function around its maximum point *

Compute Expectation (Jin & Si, 2004) • Expectation can be computed analytically using the approximate posterior distribution p(|D)

Empirical Studies • EachMovie dataset • 400 training users, and 1600 test users • For each test user • Initially provides 3 rated items • 5 iterations, and 4 items are selected for each iteration • Evaluation metric • Mean Absolute Error (MAE)

Baseline Approaches • The random selection method • Randomly select 4 items for each iteration • The model entropy method • Select items that result in the largest reduction in the entropy of distribution p(|D) • Only considers the uncertainty in model distribution • The prediction entropy method • Select items that result in the largest reduction in the uncertainty of rating prediction • Only considers the uncertainty in rating prediction

Experimental Results

Summary • Active learning is effective for identifying users’ interests • It is important to take into account every bit of uncertainty when applying active learning methods

Problem IIIHow to Combine Collaborative Filtering with Content Filtering?

Collaborative Filtering + Content Info.

Linear Combination (Good et al., 1999) • Build a different prediction model for content information and collaborative information • Linearly combine their predictions together

Collaborative Filtering