Prastava

Prastava An open source pure ruby based generic recommendation system

Submitted by : • HimanshuGahlot, MNNIT, Allahabad, India/ WING, NUS, Singapore • Tarun Kumar, IIIT, Allahabad, India/ WING, NUS, Singapore Project Guide : Prof. Min Yen Kan, Assistant Professor, NUS, Singapore

Motivation • Many websites use recommendation systems built in other languages. • Why ruby?

Methods Used • Collaborative Filtering • Content Based Filtering

Collaborative Filtering (CF) • The most commonly adopted technique in crafting academic and commercial [1] recommender systems. • Making recommendations based upon ratings that users have assigned to items. • Two types : - User-based collaborative filtering - Item-based collaborative filtering

The CF Engine

User Based Collaborative Filtering • An item x user matrix with each user giving ratings to each item is taken as input.

An example rating matrix with four users having rated 6 seasons :

Item Based Collaborative Filtering

An example of item based CF

Algorithms for similarity measure • Cosine Similarity • Pearson Correlation

Cosine Similarity • Cosine similarity between two vectors can be defined as : where A and B are the two ranking vectors

Pearson Correlation • A correlation is a number between -1 and +1 that measures the degree of association between two variables (call them X and Y). • The formula for Pearson correlation between two vectors X and Y having elements xi and yi is as follows :

Difference between Cosine Similarity and Pearson Correlation

Content Based Filtering • If the content of items is available then if one is selected we can recommend another based on the similarity of content. • Trem Frequency – Inverse Document Frequency (TF-IDF) algorithm is used for finding similarity between documents. • TF = (Freq. of the term in a doc.)/(Total sum of freq. of all terms in the same doc.) • IDF = log [(Total no. of docs.)/(Total no. of docs. which contain this term + 1)] • TFIDF = TF * IDF

Method for calculating similarity between documents • All the stop words are first removed. • The remaining terms are then changed to their root form. • TFIDF values are calculated for unique and stored in a vector. • Such vectors are produced for all documents. • Now similarity between documents can be calculated by using cosine similarity between two vectors.

Screen Shot • screenshots of working code….

CVS on

Thank You

Prastava

Prastava

Presentation Transcript