A Classification Approach for Movie Recommender System

A Classification Approach for Movie Recommender System 指導教授：黃三益　老師學生：M964020007黃于珊 M964020011李界寬 M964020022程尚文

Agenda • Introduction • Motivation and background • Determination of data set • The Data Mining Procedure • Conclusion and Limitation

INTRODUCTION 1.MOTIVATION AND BACKGROUND2.DETERMINATION OF DATA SET

Motivation and background • Dataset來源自GroupLens • (Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/) • 線上電影推薦系統-MovieLens ( http://www.movielens.org/ ) • 加入會員，評價隨機選出的數部電影，即可享受到網站給予的五部電影之推薦，並附上預測使用者喜好該電影的程度。 • We all loves movies • Find the rule

Determination of data set • 使用MovieLens目前提供兩種Datasets的其中一種。 • 內容包含1682部電影，943 使用者，共100,000 ratings。 • 提供足夠的樣本規模，讓我們可以適當的建立和測試模型。

The Data Mining Procedure 1.DATA MINING PROCEDURE:10 STEP2. CONCLUSION AND LIMITATION

Step 1. Translate the business problem into a data mining problem • 電影種類與數目相當繁多，如何在眾多的電影中可以快速的找到符合自己偏好的電影? • 電影推薦系統 • 縮短搜尋時間 • Find the Rule • 年齡、職業、性別等之偏好那些種類的電影 • Potential customers

Step 2. Select appropriate data • 線上電影推薦系統-MovieLens • Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/) • 資料來源自加入其網站的會員對電影所作的評價與會員的相關個人資料 • 其所提供的Dataset內容包含1682部電影，943 使用者，共100,000 ratings。

Step 3. Get to know the data(1/2) • This data has been cleaned up • users who had less than 20 ratings • did not have complete demographicinformation

Step 3. Get to know the data(2/2)

Step 4. Create a model set • Data Source • MovieLens (The GroupLens Research Project at the University of Minnesota) • Data Characteristics: • 100,000 ratings (1-5) from 943 users on 1682 movies • Each user has rated at least 20 movies • seven-month period from September 19th, 1997 through April 22nd, 1998 • With complete demographic information

Step 5. Fix problems with the data • Variable with too many values • Movie kind • Occupation • We do not consider variables such as ZipCode and rate

Step 6.Transform data to bring information to the surface • We skip this step due to the uselessness of transforming data into different formats

Step 7. Build models • Data mining tool: • Weka Explorer 3.4.12 • Classifier • Decision tree methods • using C4.5 algorithm • Performs well on both accuracy and speed

Weka: the software

Step8. Assess Model • Confusion Matrix

Step8. Assess Model • Detailed Accuracy

Step8. Assess Model • Other Information

Step 8. Assess Model • Decision Tree • Number of Leaves：118 • Size of the tree：216

Step 9. Deploy Model • It’s difficult to deploy, because • Computer’s resources are not enough • Difficult to implementation

Conclusion and Limitation • Classification Approach : C4.5 → Decision Tree • Data Set : 35,130 data • Limitation • Hardware and software don’t support enough to mining more data to find more interest and complete rules.

Thanks For Your Attention.

A Classification Approach for Movie Recommender System

A Classification Approach for Movie Recommender System

Presentation Transcript

Classification System

MovieGEN: A Movie Recommendation System

Prediction Strategies in a TV Recommender System

A New Approach for Classification :

Open Source Recommender System

A System for eCommerce Recommender Research with Context and Feedback

Introduction to Recommender System

A Recommender System based on the Immune Network

Book Recommender System

Recommender system

LCARS: A Location-Content-Aware Recommender System

LCARS: A Location-Content-Aware Recommender System

MOVIE RETRIEVAL SYSTEM

A Lazy Approach to Associative Classification

Time for a Universal Soil Classification System

Classification System

A Randomized Exhaustive Propositionalization Approach for Molecule Classification

Recommender System

Music Emotion Classification: A Fuzzy Approach

MOVIE RETRIEVAL SYSTEM

A Graph-based Recommender System

Learning user preferences for 2CP-regression for a recommender system