340 likes | 469 Views
Document Recommendation in Social Tagging Services. Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai , and X. He Zhejiang University, China WWW 2010 July 22, 2010 Hyunwoo Kim. Contents. Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion. Introduction [1/5].
E N D
Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July 22, 2010 Hyunwoo Kim
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
Introduction [1/5] • Social tagging services • Allowing users to annotate various online resources with tags • Facilitating the users in finding and organizing online resources • Providing meaningful collaborative semantic data • Recommender systems • Focusing on user rating data in traditional studies • Social tagging data is becoming more and more prevalent recently • In this paper • The problem of document recommendation using purely tagging data
Introduction [2/5] • Searching in most tagging services • Keyword-based search • The number of returned results is very large • Returning resources which literally match the given tags • Ignoring semantically related tags • Searching for automobile→ resources tags by car may not be retrieved
Introduction [3/5] • Differences between tagging data and rating data • Tagging data doesn’t have users’ explicit preference information on resources • Tagging data: user, tag and resource • Rating data: user and resource • Collaborative filtering method
Introduction [4/5] • Multi-type Interrelated Objects Embedding (MIOE) • Annotation relationships between tags and documents • Usage relationships between tags and users • Bookmarking relationships between users and documents • Affinity relationships among documents • 3 bipartite graphs and 1 affinity graph • Optimal semantic space • Preserving the connectivity structure of these graphs • Representing users, tags and documents in the same space if (two objects are strongly connected) { the corresponding edge has a high weight; two object should be mapped close to each other in the space; }
Introduction [5/5] • Goal of MIOE • Given a user, the closest documents which have not been bookmarked by this user are recommended to her • Naturally capturing the correlations among tags • Applied to any social tagging data as long as a notion of similarity between resources is defined
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
Multi-type Interrelated Objects Embedding [1/7] The basic intuition behind MIOE if (a user u has used a tag t many times) { she has strong interest in the topic represented by the tag t; } if(t has been applied to document d many times) { d is strongly related to the topic represented by t; } We should recommend such document d to the user u;
MIOE [3/7]- Learning the Optimal Semantic Space y : documents : users : tags x z Representing users, tags and documents in the same space Strongly connected two objects should be mapped close to each other in the learned space
MIOE [4/7]- Learning the Optimal Semantic Space • The problem • Finding a semantic space for users, tags and document which best preserves the connectivity structures of graphs • Annotation relationship, usage relationship, bookmark relationship and affinity relationship • Given a user, recommending a list of document in which the users would be interested with the highest probabilities M. Belkin et al., “LaplacianEigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001 W. Min et al., “Locality Pursuit Embedding”, Pattern Recognition 37, 2004 X. He et al., “Learning a Maximum Margin Subspace for Image Retrieval”, IEEE Transactions on Knowledge and Data Engineering 20, 2008
MIOE [5/7]- Learning the Optimal Semantic Space • Projections* • PCA (Principal Component Analysis) • LPE (Locality Pursuit Embedding) * W. Min et al., “Locality pursuit Embedding”, Pattern Recognition 37, 2004
MIOE [6/7]- Learning the Optimal Semantic Space A(a) B(b) B(b1, b2) A(a1, a2) B(b1, b2, b3) A(a1, a2, a3) • Distance metric: Euclideandistance
MIOE [7/7]- Learning the Optimal Semantic Space • In practice • New objects will continually join in the tagging data • Re-computing the optimal space for each new object is costly • Solution • Approximating the positions of new objects in the learned space by using approximated eigenfunctions based on the kernel trick* • Re-computing the optimal space periodically * Y. Bengio et al., “Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering”, Advances in Neural Information Processing Systems 16, 2003
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
Experiments [1/6] • Data sets: Del.icio.us and CiteULike • Compared Algorithms • User-CF: a version of user-based CF algorithm for unary data • Funk-SVD: Singular Vector Decomposition to approximate the original user-item matrix using a low rank matrix • TVS: Tag Vector Similarity to represent users and document in the tag space as TF-IDF tag profile vectors • CVS: Content Vector Similarity to maintain multiple for a user to better capture the user’s interests
Experiments [2/6] • Evaluation methodology • Total 300 users • 270 users as training users • 30 users as test users • 50% bookmarks are used for model construction (training) • Remaining 50% bookmarks are used for evaluation (ground truth) • Evaluation metrics • Precision • Mean Average Precision (MAP) • Normalized Discount Cumulative Gain (NDCG)
Experiments [6/6] • Case studies • Recommended Web pages • Nearest tags
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
Conclusion • Focusing on the problem of document recommendation in social tagging services • Modeling as a representation learning problem • Proposing a novel semantic space learning algorithm (MIOE) • Optimal semantic space for users, tags and documents by keeping related objects close in the target space • Future work • Examining tag ambiguity issue which is harmful to MIOE • Improving MIOE’s scalability to be applied to very large datasets
Appendix [1/9] Q(f, g, p):cost function f: |U|x1 vector for U, fiis the coordinate of uion the line g: |T|x1 vector for T, giis the coordinate of tion the line p: |D|x1 vector for D, piis the coordinate of dion the line Rut, Rtd, Rud: weighted adjacent matrices W: affinity matrix
Appendix [2/9] Dut: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row ofRut Dtu: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rut
Appendix [3/9] Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rtd Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rtd Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rud Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rud
Appendix [4/9] Using graph Laplacian matrix* D: diagonal matrix, (i, i)-th elements equal to the sum of the i-th row of W W: affinity matrix * M. Belkin et al., “LaplacianEigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001
Appendix [5/9] Using Rayleigh quotient*in order to remove an arbitrary scaling factor * J. Ham et al., “Semisupervised alignment of manifolds”, the Annual Conference on Uncertainty in Artificial Intelligence, 2005
Appendix [6/9] Using Rayleigh quotient
Appendix [7/9] • By the Rayleigh-Ritz theorem* • The solution of this optimization problem is given by the eigenvector corresponding to the second smallest eigenvalue of * H. Lutkepohl, “Handbook of Matrices”, Wiley, 1996
Appendix [8/9] Maximizing the global variance in the target subspace instead of maximizing The variance of f, g and p* * F. R. K. Chung, “Spectral Graph Theory”, American Mathematical Society, 1997
Appendix [9/9] The optimization problem becomes This optimization problem can be solved by finding the generalized eigenvector corresponding to the second smallest eigenvalue of