230 likes | 471 Views
A Classification Approach for Movie Recommender System. 指導教授:黃三益 老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文. Agenda. Introduction Motivation and background Determination of data set The Data Mining Procedure Conclusion and Limitation. INTRODUCTION.
E N D
A Classification Approach for Movie Recommender System 指導教授:黃三益 老師 學生:M964020007黃于珊 M964020011李界寬 M964020022程尚文
Agenda • Introduction • Motivation and background • Determination of data set • The Data Mining Procedure • Conclusion and Limitation
INTRODUCTION 1.MOTIVATION AND BACKGROUND2.DETERMINATION OF DATA SET
Motivation and background • Dataset來源自GroupLens • (Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/) • 線上電影推薦系統-MovieLens ( http://www.movielens.org/ ) • 加入會員,評價隨機選出的數部電影,即可享受到網站給予的五部電影之推薦,並附上預測使用者喜好該電影的程度。 • We all loves movies • Find the rule
Determination of data set • 使用MovieLens目前提供兩種Datasets的其中一種。 • 內容包含1682部電影,943 使用者,共100,000 ratings。 • 提供足夠的樣本規模,讓我們可以適當的建立和測試模型。
The Data Mining Procedure 1.DATA MINING PROCEDURE:10 STEP2. CONCLUSION AND LIMITATION
Step 1. Translate the business problem into a data mining problem • 電影種類與數目相當繁多,如何在眾多的電影中可以快速的找到符合自己偏好的電影? • 電影推薦系統 • 縮短搜尋時間 • Find the Rule • 年齡、職業、性別等之偏好那些種類的電影 • Potential customers
Step 2. Select appropriate data • 線上電影推薦系統-MovieLens • Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/) • 資料來源自加入其網站的會員對電影所作的評價與會員的相關個人資料 • 其所提供的Dataset內容包含1682部電影,943 使用者,共100,000 ratings。
Step 3. Get to know the data(1/2) • This data has been cleaned up • users who had less than 20 ratings • did not have complete demographicinformation
Step 4. Create a model set • Data Source • MovieLens (The GroupLens Research Project at the University of Minnesota) • Data Characteristics: • 100,000 ratings (1-5) from 943 users on 1682 movies • Each user has rated at least 20 movies • seven-month period from September 19th, 1997 through April 22nd, 1998 • With complete demographic information
Step 5. Fix problems with the data • Variable with too many values • Movie kind • Occupation • We do not consider variables such as ZipCode and rate
Step 6.Transform data to bring information to the surface • We skip this step due to the uselessness of transforming data into different formats
Step 7. Build models • Data mining tool: • Weka Explorer 3.4.12 • Classifier • Decision tree methods • using C4.5 algorithm • Performs well on both accuracy and speed
Step8. Assess Model • Confusion Matrix
Step8. Assess Model • Detailed Accuracy
Step8. Assess Model • Other Information
Step 8. Assess Model • Decision Tree • Number of Leaves:118 • Size of the tree:216
Step 9. Deploy Model • It’s difficult to deploy, because • Computer’s resources are not enough • Difficult to implementation
Conclusion and Limitation • Classification Approach : C4.5 → Decision Tree • Data Set : 35,130 data • Limitation • Hardware and software don’t support enough to mining more data to find more interest and complete rules.