KDD CUP 2007

KDD CUP 2007 Neural Network HW2 Group 14 Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305) KDD CUP 2007 Neural Network HW2

How? (method & system) Group 14 HW 2

1. Make into a matrix • From analyzing thefilm types that the customers has rated, we can predict the customers’ rating on the other films in the same type. Group 14 HW 2

2. The characteristics of the problem • This problem takes the data in an enormous database as a basis. • The rating series of every customer imply the personality, favorite and time interval. • Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series. • Every customer can compile statistics, and it is appraised that what user rated, regarded as time series. Group 14 HW 2

Methods → How to find the similar films and similar users? • Similarity measures • Use Poisson regression • Clustering analysis • Association rule • Random forests • Collaborative filtering method (group filter or social filtering) • Singular value decomposition (SVD) Group 14 HW 2

System • <Weka> : multilayer perceptron (MLP) • Data mining software in Java • <MATLAB> : backpropagation • The language of technical computing • <MS SQL 2005> : clustering • A comprehensive, integrated data management and analysis software Group 14 HW 2

Result (training & test set) Group 14 HW 2

Difficulty confronted • “ Out of memory!! ”-- The dataset size is too large. • Not enough eigenvalues of the dataset. • What are the valuable eigenvalues we really need? • Which algorithm should be used? Group 14 HW 2

Training & Test set • Downsize the dataset : Grouping by their eigenvalues (using SQL)  Sampling from the groups for training • Make the sampled dataset into a matrix • Train in the tool : Weka, MATLAB • Evaluate the accuracy by RMSE Group 14 HW 2

The Sketch Group 14 HW 2

SQL Server Group 14 HW 2

MATLAB(1/2) Group 14 HW 2

MATLAB(2/2) (# Training Data = 10040, Test Data = 42) Group 14 HW 2

Weka (# Training Data = 118, Test Data = 13) Group 14 HW 2

Analysis (why) Group 14 HW 2

Analysis • <Weka> • We regard the data as a matrix of the movies and users • Defect：enormous matrix Solution：classify the movies or users first • Minimum of the wrong rate：multilayer perceptron • neural number＆training times • <MATLAB> • Not enough eigenvalue (only one eigenvalue about movie classification) • We will find more eigenvalue about the dependence among the movie and customer (use SVD) Group 14 HW 2

Thank You! Group 14 HW 2

KDD CUP 2007

KDD CUP 2007

Presentation Transcript

KDD Cup 2009

Download Estimation for KDD Cup 2003

KDD-Cup 2004

KDD Cup Survey

KDD Cup 2009

KDD Cup Task 2

KDD-Cup 2000 Peeling the Onion

Targeted Marketing, KDD Cup and Customer Modeling

Kdd Cup 2013 Author Paper Identification Final Report

KDD CUP 2007

KDD CUP 2007

KDD-Cup A Survey: 1997-201 2

KDD Cup 2000 Question 5

ACM KDD Cup A Survey: 1997-2011

KDD Cup 2000 Question 1

KDD CUP 2001 Task 1: Thrombin

Story of IBM Research’s success at KDD/Netflix Cup 2007

KDD Cup 2007 Task I Algorithm & Analysis

Story of IBM Research’s success at KDD/Netflix Cup 2007

KDD-2001 Cup The Genomics Challenge

KDD CUP 2007

KDD CUP 2007

Presentation Transcript

KDD Cup 2009

Download Estimation for KDD Cup 2003

KDD-Cup 2004

KDD Cup Survey

KDD Cup 2009

KDD Cup Task 2

KDD-Cup 2000 Peeling the Onion

Targeted Marketing, KDD Cup and Customer Modeling

Kdd Cup 2013 Author Paper Identification Final Report

KDD CUP 2007

KDD CUP 2007

KDD-Cup A Survey: 1997-201 2

KDD Cup 2000 Question 5

ACM KDD Cup A Survey: 1997-2011

KDD Cup 2000 Question 1

KDD CUP 2001 Task 1: Thrombin

Story of IBM Research’s success at KDD/Netflix Cup 2007

KDD Cup 2007 Task I Algorithm &amp; Analysis

Story of IBM Research’s success at KDD/Netflix Cup 2007

KDD-2001 Cup The Genomics Challenge

KDD Cup 2007 Task I Algorithm & Analysis