170 likes | 321 Views
KDD CUP 2007. Neural Network HW2 Group 14. Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305). How? (method & system). 1. Make into a matrix. From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.
E N D
KDD CUP 2007 Neural Network HW2 Group 14 Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305) KDD CUP 2007 Neural Network HW2
How? (method & system) Group 14 HW 2
1. Make into a matrix • From analyzing thefilm types that the customers has rated, we can predict the customers’ rating on the other films in the same type. Group 14 HW 2
2. The characteristics of the problem • This problem takes the data in an enormous database as a basis. • The rating series of every customer imply the personality, favorite and time interval. • Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series. • Every customer can compile statistics, and it is appraised that what user rated, regarded as time series. Group 14 HW 2
Methods → How to find the similar films and similar users? • Similarity measures • Use Poisson regression • Clustering analysis • Association rule • Random forests • Collaborative filtering method (group filter or social filtering) • Singular value decomposition (SVD) Group 14 HW 2
System • <Weka> : multilayer perceptron (MLP) • Data mining software in Java • <MATLAB> : backpropagation • The language of technical computing • <MS SQL 2005> : clustering • A comprehensive, integrated data management and analysis software Group 14 HW 2
Result (training & test set) Group 14 HW 2
Difficulty confronted • “ Out of memory!! ”-- The dataset size is too large. • Not enough eigenvalues of the dataset. • What are the valuable eigenvalues we really need? • Which algorithm should be used? Group 14 HW 2
Training & Test set • Downsize the dataset : Grouping by their eigenvalues (using SQL) Sampling from the groups for training • Make the sampled dataset into a matrix • Train in the tool : Weka, MATLAB • Evaluate the accuracy by RMSE Group 14 HW 2
The Sketch Group 14 HW 2
SQL Server Group 14 HW 2
MATLAB(1/2) Group 14 HW 2
MATLAB(2/2) (# Training Data = 10040, Test Data = 42) Group 14 HW 2
Weka (# Training Data = 118, Test Data = 13) Group 14 HW 2
Analysis (why) Group 14 HW 2
Analysis • <Weka> • We regard the data as a matrix of the movies and users • Defect:enormous matrix Solution:classify the movies or users first • Minimum of the wrong rate:multilayer perceptron • neural number&training times • <MATLAB> • Not enough eigenvalue (only one eigenvalue about movie classification) • We will find more eigenvalue about the dependence among the movie and customer (use SVD) Group 14 HW 2
Thank You! Group 14 HW 2