190 likes | 383 Views
Prastava. An open source pure ruby based generic recommendation system. Submitted by :. Himanshu Gahlot , MNNIT, Allahabad, India/ WING, NUS, Singapore Tarun Kumar, IIIT, Allahabad, India/ WING, NUS, Singapore Project Guide : Prof. Min Yen Kan, Assistant Professor, NUS, Singapore.
E N D
Prastava An open source pure ruby based generic recommendation system
Submitted by : • HimanshuGahlot, MNNIT, Allahabad, India/ WING, NUS, Singapore • Tarun Kumar, IIIT, Allahabad, India/ WING, NUS, Singapore Project Guide : Prof. Min Yen Kan, Assistant Professor, NUS, Singapore
Motivation • Many websites use recommendation systems built in other languages. • Why ruby?
Methods Used • Collaborative Filtering • Content Based Filtering
Collaborative Filtering (CF) • The most commonly adopted technique in crafting academic and commercial [1] recommender systems. • Making recommendations based upon ratings that users have assigned to items. • Two types : - User-based collaborative filtering - Item-based collaborative filtering
User Based Collaborative Filtering • An item x user matrix with each user giving ratings to each item is taken as input.
An example rating matrix with four users having rated 6 seasons :
Algorithms for similarity measure • Cosine Similarity • Pearson Correlation
Cosine Similarity • Cosine similarity between two vectors can be defined as : where A and B are the two ranking vectors
Pearson Correlation • A correlation is a number between -1 and +1 that measures the degree of association between two variables (call them X and Y). • The formula for Pearson correlation between two vectors X and Y having elements xi and yi is as follows :
Difference between Cosine Similarity and Pearson Correlation
Content Based Filtering • If the content of items is available then if one is selected we can recommend another based on the similarity of content. • Trem Frequency – Inverse Document Frequency (TF-IDF) algorithm is used for finding similarity between documents. • TF = (Freq. of the term in a doc.)/(Total sum of freq. of all terms in the same doc.) • IDF = log [(Total no. of docs.)/(Total no. of docs. which contain this term + 1)] • TFIDF = TF * IDF
Method for calculating similarity between documents • All the stop words are first removed. • The remaining terms are then changed to their root form. • TFIDF values are calculated for unique and stored in a vector. • Such vectors are produced for all documents. • Now similarity between documents can be calculated by using cosine similarity between two vectors.
Screen Shot • screenshots of working code….