490 likes | 879 Views
A Graph-based Recommender System. Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002. Acknowledgement: NSF DLI-2 (IIS-9817473). Agenda. Introduction Literature Review A Graph-based Recommender System
E N D
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002 Acknowledgement: NSF DLI-2 (IIS-9817473)
Agenda • Introduction • Literature Review • A Graph-based Recommender System • Research Questions • Research Testbed and Experiments • Conclusion and Discussion • Questions and Comments
Introduction Recommender System: From Business Application to Digital Libraries
Information Overload • Product information • User information • Interaction information between users and products • Challenges for both buyers and sellers
Recommender Systems • Automatic recommendation generation • Substantial research interests • PHOAKS (1997), Syskills & Webert (1997), Fab (1997),GroupLens (1998) • Commercial applications • Amazon.com, CDNOW, Drugstore, MovieFinder • Business success (Schafer et al. 2001) • Browser to buyer • Cross-selling • Customer loyalty
Digital Libraries • Information overload • Library content information • User information • Library usage information • Recommender system for Digital Libraries • Efficient knowledge dissemination • User satisfaction
Literature Review Recommender System: System Inputs and Recommendation Approaches
Recommender System • Recommending items to users by predicting user’s interest in an item based on various sorts of information including item, user information and interactions between users and items. • Items - documents, web pages, books, movies, restaurants, etc.
System Inputs • User factual data • Demographic information • Item factual data • Structural attribute information • Textual description/content information • Transactional data • Explicit feedback – rating, comments • Implicit feedback – purchase, browsing
Recommendation Approaches • Content-based approach • Based on item factual data • Item neighborhood formation • Machine learning methods • Collaborative filtering approach • Based on user factual data and transactional data • User neighborhood formation • Similarity functions, correlation, clustering • Collaborative filtering association rules (Fu et al. 2000)
Recommendation Approaches (cont.) • Hybrid approach • Combining content-based approach and collaborative filtering approach • Combining recommendation results • (Claypool et al. 1999) • Collaborative filtering augmented by content analysis • (GroupLens, Sarwar et al. 1998, Fab, Balabanovic and Shoham 1997) • Comprehensive models • (Basu et al. 1998)
A Graph-based Recommender System Model and Recommendation Methods
A Two-layer Graph Model • Goal • Comprehensive representation • Support flexible recommendation approaches • A two-layered graph model • User layer – users as nodes, user similarity as links • Item layer – items as nodes, item similarity as links • Inter-layer links – interaction between user and items
Model Characteristics • Comprehensiveness • All three types of system inputs • Transformation of feature data into similarity data • Flexibility • Flexible similarity calculation • Multiple types of transactional data • Recommendation as a graph search task • Finding item nodes highly associated with the user nodes • Support different recommendation approaches • Different association calculation methods
Recommendation Approaches • Content-based approach • Starting from item nodes associated with the target user, exploring the item-layer links • Collaborative filtering approach • Starting from the target user node, exploring the user-layer links and inter-layer links • Hybrid approach • Starting with the target user node, exploring all three types of links
Recommendation Methods • Low-degree association • Exploring direct associations • High-degree association • Exploring transitive associations • A simple example • 1-degree association • <C1, B1> = 0 • 2-degree association • <C1,B1> = 0.5*0.6=0.3 (C1-B2-B1) • 3-degree association • <C1, B1> = 0.3+0.21+0.12+0.28=0.91 (C1-B2-B1, C1-C2-B2-B1, C1-B2-B3-B1, C1-C2-B3-B1)
Recommendation Methods (cont.) • High-degree association recommendation algorithm • High-degree association retrieval in associative retrieval literature • Hopfield Net Spreading Activation (Chen and Ng 1995, Houston et al. 2000) • Item and user nodes as neurons and links as synapses in the Hopfield Net • Parallel relaxation search • Stop until activation values in the network converge • Item nodes with highest activation values as recommendations
Recommender System Problems • Content-based recommendation • Over-specification • Collaborative filtering recommendation • Early rater problem • Sparsity problem • Possible solutions • Hybrid recommendation approach • High-degree association recommendation
Research Questions • Whether hybrid recommendation approach achieves higher recommendation quality over content-based or collaborative filtering approaches? • Whether high-degree association recommendation improves the recommendation quality?
Research Testbed • A online bookstore • Books.com.tw • One of the biggest online bookstores in Taiwan • Data Set • 2000 Customers • 9695 Books • 18771 Transaction Records • Similarity with a typical digital library environment • Books with description and attributes – Electronic documents in DL • Customer demographic information – DL user demographic information • Customers with purchase history – DL users with browsing or borrowing histories
Implementation Details • Book representation • Book attributes (price, publisher,layout, etc.) • Book content (title, keyword, introduction, etc.) • Chinese key phrase extraction • Mutual Information algorithm (Ong and Chen 1999) • Similarity calculation • Attribute based similarity • Book content similarity • An asymmetric algorithm based on key phrase vector model (Houston et al. 2000)
Book Sales Transactions 2000.1 2000.2 2000.3 2000.4
Experiment Procedure • Holdout testing • Use half of the purchases (past purchases) to make recommendations. See if they match the other half (future purchases). (Sarwar et al. 1998) • Used 100 randomly selected customers as sample data. • Measurement of recommendation quality
Hypotheses • Hybrid recommendation approach achieves better performance than content-based recommendation approach • Hybrid recommendation approach achieves better performance than collaborative filtering recommendation approach • Exploring high-degree associations achieves better performance than only exploring low-degree associations.
Experiment Results • Statistical results • Hybrid approach achieved significantly higher precision and recall than content-based (t-test p-value: precision: 0.0058, recall: 0.0000) and collaborative approaches (t-test p-value: precision: 0.0016, recall: 0.0002) • No significant difference between high-degree association and low-degree association methods
Conclusion and Discussion • A generic graph model for recommender systems • Comprehensive data representation • Flexible recommendation approaches • Applicable in Digital Libraries • A hybrid approach improved recommendation quality • No significant improvement was observed for high-degree association methods
Conclusion and Discussion (cont.) • About low precision and recall • The gap between interest and purchase behavior • Online bookstore data might not fully represent users’ interests • High-degree association method • Poor performance might be related to the density of the graph
Recent Development • The relationship between high-degree association recommendation performance and graph density • Implementation of association rule mining under the graph model for different recommendation approaches • Implementation of other associative retrieval algorithms for high-degree association recommendation • Associative Linear Retrieval Model • Leaky Capacity Model Spreading Activation • Branch-and-Bound Spreading Activation
For Project Information http://ai.bpa.arizona.edu zhuang@eller.arizona.edu Acknowledgement NSF DLI-2 #9817473