Kostadin Georgiev, VMware Bulgaria Preslav Nakov , Qatar Computing Research Institute

A non-IID Frameworkfor Collaborative Filtering with Restricted Boltzmann Machines Kostadin Georgiev, VMware Bulgaria PreslavNakov, Qatar Computing Research Institute ICML, June 17, 2013, Atlanta 5. Experiments & Evaluation 1. Overview • We propose a non-IID framework for collaborative filtering • based on Restricted Boltzmann Machines (RBM) • models both user-user and item-item correlations • uses real values in the visible layer (as opposed to multinomial) to model user-item ratings, thus taking advantage of the natural order between them • We further explore the potential of combining the original training data with data generated by the RBM-based model itself in a bootstrapping fashion • Evaluation: on two MovieLens datasets - with 100K and 1M user-item ratings, respectively • Results: Rival the state-of-the-art Data • Evaluation • Two MovieLens datasets: • 100k: • 1,682 movies assigned • 943 users • 100,000 ratings • sparseness: 93.7% • Each rating is between 1 (worst) and 5 (best) • Mean Absolute Error (MAE): • Cross-validation • 5-fold • 80%:20% training:testing data splits • 1M: • 3,952 movies • 6,040 users • 1 million ratings • sparseness: 95.8% - Item-based RBM model outperforms user-based, but not by much. - Hybrid item-based RBM + NB model is relatively insensitive to number of units. 2. User/Item-based RBM Model • The visible layer • represents either all ratings by a user or all rating for an item • units model ratings as real values (vs. multinomial) • noise-free reconstruction is better 3. Non-IID Hybrid RBM Model In the IID case, multinomial visible units are better than real-valued. In the non-IID case: real-valued visible units outperform multinomial. 6. Comparison to Previous Work MovieLens 1M • Remove the IID assumption for the training data • Topology: Unit vij is connected to two independent hidden layers: one user-based and another item-based. • Missing values (ratings): the generatedpredictions are used during testing, but are ignored during training • Training procedure: we average the predictions of the user-based and of the item-based RBM models MovieLens 100k 4. Neighborhood + I-RBM Use the I-RBM predictions from a neighborhood-based algorithm: 7. Conclusion & Future Work However, compute the averages from the original ratings: • Conclusion • proposed a non-IID RBM framework for collaborative filtering • rivals the best previous algorithms, which are more complex • Future work • add an additional layer to model higher-order correlations • add content-based features, e.g., demographic

Kostadin Georgiev, VMware Bulgaria Preslav Nakov , Qatar Computing Research Institute

Kostadin Georgiev, VMware Bulgaria Preslav Nakov , Qatar Computing Research Institute

Presentation Transcript

1. Qatar Research Institutes

INSTITUTE OF COMPUTING TECHNOLOGY

Qatar Environment and Energy Research Institute (QEERI) from Carbon to Creativity

INSTITUTE OF COMPUTING TECHNOLOGY

Research Computing

Institute for Quantum Computing

INSTITUTE OF COMPUTING TECHNOLOGY

Computing Research Institute Dean Advisory Committee

Christo Georgiev NIMH , Bulgaria

Research Computing

Qatar research

NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER

NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER

NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER

Atanas Georgiev Chanev

+ Institute for Nuclear Research and Nuclear Energy, Sofia, Bulgaria

Qatar Environment and Energy Research Institute (QEERI) from Carbon to Creativity

Christo Georgiev NIMH , Bulgaria

BULGARIA Institute for Social Integration Liberal politological institute

Christo Georgiev NIMH, Bulgaria

1V0-61.21 Associate VMware End-User Computing