340 likes | 562 Views
Mining customer ratings for product recommendation using the support vector machine and the latent class model. William K. Cheung, James T. Kwok, Martin H. Law, Kwok-Ching Tsui. Intelligent Systems Research Group, BT Laboratories. Hong Kong Baptist University. Records of other customers
E N D
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin H. Law, Kwok-Ching Tsui Intelligent Systems Research Group, BT Laboratories Hong Kong Baptist University
Records of other customers (possibly with ratings) Recommender System . . . What is a Recommender System?
Product Recommendation in E-commerce Products Recommendations www.amazon.com
Product Recommendation in E-commerce Products Recommendations www.cdnow.com
Content-based Recommender System The Support Vector Machine (SVM) The Extended Latent Class Model (ELCM) Personal Profile Records of other customers (possibly with ratings) Collaborative Recommender System . . . Ratings Ratings Overview
Presentation Outline • Content-based Recommendation • Existing Solutions and Their Limitations • Our Proposed Solution - the SVM • Collaborative Recommendation • Existing Solutions and Their Limitations • Our Proposed Solution - the Extended LCM • Experimental Evaluation • Conclusion and Future Works
Content-based Recommender System Personal Profile Content-based Recommendation • Matching between the personal profile and the features extracted from product descriptions. • Assumptions: • Customer personal profiles are available. • Detailed product descriptions are available so that a set of representative features can be extracted. • Both the profiles and the product descriptions share the same representation.
Some Existing Solutions • Keyword Matching • problems of synonymy and polysemy. • Pattern Classification Approaches • f={ f1(y), f2(y), … fm(y)} the set of features for product y • ax(f(y)) the classifier output for customer x’s interest obtained via training, such that • Examples of classifiers: • Naïve Bayes, k-NN, C4.5 (decision tree)
Feature Selection Problem • The performance of content-based recommendation depends heavily on the discriminative power of the features selected to be extracted. • Too few features => hard to learn useful profiles (shallow analysis) • Too many features => hard to estimate the classifier’s parameters with good generalisation performance.
Our Proposed Solution - the use of SVM • The Support Vector Machine has been shown to be able to achieve good generalisation performance for classification of high-dimensional data sets and its training can be framed as solving a quadratic programming problem. • => ones can simply use all extracted features for the input and there is no need for feature selection at all.
Support Vector Machine (SVM) • Intuitively, maximize the margin between classes • Theoretically sound • related to minimizing the VC-dimension under the theory of structural risk minimization margin
Solving for the line • Computationally, this leads to a quadratic programming problem • maximize a quadratic objective function subject to some linear constraints • no local maximum (cf neural networks)
Support Vectors • The line depends only on a small number of training examples.
Nonlinear Cases • use another coordinates system such that the “curve” becomes a “line”
Kernels • Only inner products, (x)T (y) , are involved in the calculation • Under certain conditions, there exists a kernel K such that K(x,y)=(x)T (y) • e.g. Polynomial of degree d: K(x,y)=(xTy+1)d • replace xTy by (x)T (y)
Overlapping Cases • Impossible to perfectly separates the two classes • Include an error term • Instead of maximizing margin, minimize error + / margin • Again, involves only quadratic programming
Records of other customers (possibly with ratings) Collaborative Recommender System . . . Product Ratings Product Ratings Collaborative Recommendation • Matching between the customer’s ratings with the ratings of others (the word-of-mouth approach). • Assumptions: • Customer ratings of a reasonably large group of customers are available. • Each product has been rated by some of the customers. • The product ratings are overlapping to certain degrees.
Some Existing Solutions • Memory-based Approach • Pearson Correlation Coefficient • … and its variants • suffer from the sparsity and the first-rater problems. • Model-based Approach • solve the sparsity problem by incorporating a priori models. • E.g., Naïve Bayes Classifier, Bayesian Network, Latent Class Model
Limitations • The sparsity problem (lacking sufficient ratings) • The first-rater problem (encountering new products) 5 - - 4 - - - - Customer x1 - 5 4 - - - - - Customer x2 1 - 4 - 4 - - - Customer x3 5 - - - - - - - A New Customer xn
Recommended ! Recommended ! Grouping Preference Ratings - to solve the sparsity problem Preference Pattern #1 Preference Pattern #2 5 - - 4 - - - - Customer x1 - 5 4 - - - - - Customer x2 1 - 4 - 4 - - - Customer x3 5 - - - - - - - A New Customer xn
Recommended ! Integrating Product Contents - to solve the first-rater problem Preference Pattern #1 Preference Pattern #2 5 - - 4 - - - - Customer x1 - 5 4 - - - - - Customer x2 1 - 4 - 4 - - - Customer x3 5 - - - - - - - A New Customer xn
Our Proposed Solution - the use of LCM • The latent class model has been proposed by Thomas Hofmann et al. in IJCAI’99 for clustering preference ratings with promising results. • Limitation:only capable of recommending products to customers in the training set. • We extend their model so that • a) Existing products can be recommended to the customers not in the training set • b) New products can be recommended to the existing customers (not described in the paper).
Observed Hidden Customer X Preference Pattern Z Product Y Latent Class Model Model Training: Learn P(z), P(x|z) and P(y|z) using the EM algorithm. The model initialization is done by the K-means clustering.
Existing Products to Existing Customers • Compute the probabilities that x is interested in y • Products can then be sorted according to the values of P(y|x) for recommendation.
Inner product ofthe pdf of pattern z and the ratings of xn. Extension 1: Existing Products to New Customers xn is not inside the training set. Thus, we don’t have P(z|xn).
distance between yn and z in the feature space Extension 2: New Products to Existing Customers yn is not inside the training set. Thus, we don’t have P(yn|z).
Performance Measures • accuracy: the percentage of correct recommendations • recall: the percentage of interesting products that can be located in the output list • precision: the percentage of products in the output list which are really interesting to the customer. • break-even point: The point where recall = precision • expected utility: • its value is high if the products rated high appear early in the output list.
Experiment One: Setup(content-based by SVM) • Product ratings data set • EachMovie (from DEC) • Product description data set • Internet Movie Database (http://www.imdb.com) • Size of feature set = 6620, including • Release date, Runtime, Language, Director, Producer, Original music, Writing credit, ... • No. of products = 1628 • 5-fold cross-validation • ~1200 for training and remaining for testing • No. of customers = 100
Experiment Two: Setup(collaborative by ELCM) • Ratings data set • EachMovie (from DEC) • Training • No. of products = 500 • No. of customers = 90 • Testing • No. of customers = 10 • No. of products = 250 • Size of the product set where ratings are considered for matching, L = {10, 63, 83, 125, 250}
Conclusion and Future Works • SVM and ELCM are empirically shown to be promising for content-based recommendation and collaborative recommendation, respectively. • Future works • ELCM • Model Enhancement - BiELCM, hierarchical, ... • Scalability issue of the EM algorithm for ELCM • Modelling dynamic preference patterns • Applications to cross-selling? • Integration of SVM and ELCM for improvement