1 / 2

10_16_10 The Netflix Program:

10_16_10 The Netflix Program:. ( M i , ProbeSup(M i )= {U i1 , …, U ik }) . mpp-mpred.C. Loops thru ProbeSup, from uservote, movieVOTE writes Predict(M i ,U ik ) to predictions  U ik ProbeSup(M i ). ( M i , Sup(M i ), U ik , Sup(U ik )) .

caitir
Download Presentation

10_16_10 The Netflix Program:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10_16_10 The Netflix Program: ( Mi, ProbeSup(Mi)={Ui1, …, Uik})  mpp-mpred.C Loops thru ProbeSup, from uservote, movieVOTE writes Predict(Mi,Uik) to predictions  UikProbeSup(Mi) ( Mi , Sup(Mi), Uik , Sup(Uik ))  ( Mi , Sup(Mi), Uik , Sup(Uik ))   vote(Mi ,Uik )  VOTE(Mi ,Uik ) mpp-user.C movie-vote.C user-vote.C prune.C Netflix Classification use the RentsTrainingTable, Rents(MID,UID,Rating,Date) and class label Rating, to classify new (MID,UID,Date) tuples (i.e., predict ratings). Nearest Neighbor User Voting: uid votes on rating(MID,UID) if it is near enough to UID in it’s ratings of movies M={mid1, ..., midk} (i.e., near is based on a User-User correlation over M ). User-User-Correlation? (Pearson, Cosine?) and the set M={mid1,…, midk }. Nearest Neighbor Movie Voting: mid votes on rating(MID,UID) if its ratings by U={uid1,..., uidk} are near enough to those of MID (i.e., near is based on a Movie-Movie correlation over U). Movie-Movie-Correlation? (Pearson or Cosine or?) and set U={uid1,…, uidk }. mpp-mpred.Creads PROBE, loops thru (Mi, ProbeSup(Mi), pass each to mpp-user.C. mpp-mpred.C can call separate instances of mpp-user.C for many Us (in parallel (governed by # of slots.) mpp-user.Cloops thru ProbeSup(M), reads config file, prints prediciton(M,U) to predictions For user votes, mpp-user.C calls user-vote.C For movie votes, mpp-user.C calls movie-vote.C user-vote.C prunes, loops thru user voters, V. calculating a V-vote. Combines V-votes and returns vote. movie-vote.Csimilar. We must loop thru V’s (VPHD rather than HPVD) because the HP required of most correlation calculations is impossible using AND/OR/COMP. Today we will take a close look at the data mining algorithms in movie-vote.C (first the Nearest Neighbor Classification code, then ARM code, then??? Similar (dual) code either exists or will exist in user-vote.C. The file, movie-vote-full.C, contains ARM attempts, Boundary-based attempts and the Nearest Neighbor Classification attempts. The file, movie-vote-justNN.C contains only the NN attempts (so we will start with that). A long term goal: generalize the code away from the Netflix problem and toward a generic data mining system (e.g., for use by the Treeminer Corp. on, say, satellite imagery?)

  2. How does one specify prunings? mpp-mpred.C specifies type of prune ( 3 types: UserPrune with a full range of possibilities; UserFastPrune with just PearsonCorrelation pruning; CommonCoSupportPrune which orders users, V, according to the size of their CommonCoSupport with U only (note that this is a correlation of sorts too.) mpp-user.C movie-vote.C user-vote.C threshold "diff of vectors" population-based std_dev prune specify leftside (from Uid) of an ID interval prune of supM specify the width of an ID interval prune of supM specify starting movie (intercept and slope) for N loop specify starting movie (intercept and slope) for V loop threshold for count based prune specify PearsonCorr threshold (b=bill, meaning: use bill's formula - note if prior pruning this will have a different value than Amal's) specify PearsonCorr threshold (a=Amal, meaning: use Amal's table lookup) threshold "vectorof diffs" population-based std_dev prune threshold "vector of diffs"sample-based std_dev prune threshold (Gaussian of) Euclidean distance based prune threshold for (Gaussian of) 1perpendicular distance prune exponent for (Gaussian of) 1perpendicular distance prune threshold (Gaussian of) a variation based prune threshold std_dev based prune Picks odering for count-based prune below: 1=Amal_Pearson, 2=Bill_Pearson, etc. threshold "diff of vectors"sample-based std_dev prune prune.C In a file (named config) there's a section for specifying the parameters for user-voting and a separate section for specifying parameters for movie-voting. E.g., for movie voting, at the bottom, there are 3 external prunings possible (0 or more can be chosen): 1. an intial pruning of dimensions to be used (since dimensions are user, it prunes supM): 2. a pruning of movie voters, N, (in supU) 3 a final pruning of dimensions (CoSupport(M,N) for the specific movie voter, N. E.g., parameters are specified for this final prune as below. Finally note that internal to user-vote and movie-vote are "internal prunings" in which voters are rejected (during their loop pass) if they fail to meet certain correlation levels). This type of internal pruning is somewhat redundant with the external prunings below. [movie_voting Prune_Users_in_CoSupMN] method = UserCommonCoSupportPrune leftside = 0 width = 8000 mstrt = 0 mstrt_mult = 0.0 ustrt = 0 ustrt_mult = 0.0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 TD = -1 TP = -1 PPm = .1 TV = -1 TSD = -1 Ch = 1 Ct = 2 Note: all thresholds for similarities, not distance i.e., when we start with a distance we follow it with the Gaussian to make it a similarity or correlation.

More Related