1 / 10

KDD CUP 2007

KDD CUP 2007. GROUP:16 Student Number : M9615002 Name : Po-Jui Sue Student Number : M9615083 Name : Shun-Jhong Niou. SYSTEM AND METHOD. We use Microsoft Windows XP Service1Pack2, AMD Athlon(tm) 64 Processor 3500+ and 1GB RAM as our test platform.

noelle
Download Presentation

KDD CUP 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KDD CUP 2007 GROUP:16 Student Number:M9615002 Name:Po-Jui Sue Student Number:M9615083 Name:Shun-Jhong Niou

  2. SYSTEM AND METHOD • We use Microsoft Windows XP Service1Pack2, AMD Athlon(tm) 64 Processor 3500+ and 1GB RAM as our test platform. • At first we delete the date attribution , because we think date isn’t more important than movie id  and customer id. • The training dataset is too huge , we get 20 datasets for each file, we use feed-forward back propagation network type and traingdm training function. TRAINGDM  training function.

  3. The target format that we use is probability matrix. , and start.

  4. RESULT • Experimental data are predicted rating 4. The rating 1 is 0.0065, 2 is 0.5534, 3 is 0.0132, 4 is 0.6126, 5 is 0.1717 and 0 is 0.0198. But the correct answer are most of rating 0. • We have a problem in our case.so we try to add some answer files to the training data set. • The output is better and the rating 1 is 0.1136, 2 is 0.0117, 3 is 0.8042, 4 is 0.084, 5 is 0.993 and 0 is 0.7571.

  5. We try to use only answer files to train and simulate. The rating 1 is 0.0027, 2 is 0.0075, 3 is 0.0229, 4 is 0.0268, 5 is 0.018 and 0 is 0.922. • And We try to change training function to get better result. • trainbfg function rating 1 is 0.002, 2 is 0.0173,3 is 0.0691, 4 is 0.1436, 5 is 0.045 and 0 is 0 • traincgb function rating 1 is 0.0058, 2 is 0.1456,3 is 0.021, 4 is 0.1667, 5 is 0.0003 and 0 is 0.0156 • traincgf function rating 1 is 0.097, 2 is 0.1461,3 is 0.2737, 4 is 0.2351, 5 is 0.1535 and 0 is 0

  6. traingp function rating 1 is 0.0096, 2 is 0.0057,3 is 0.0068, 4 is 0.0227, 5 is 0.2307 and 0 is 0.0009 • traingd function rating 1 is 0.1136, 2 is 0.0117,3 is 0.7995, 4 is 0.0843, 5 is 0.993 and 0 is 0.7491 • trainda function rating 1 is 0.0173, 2 is 0.035,3 is 0.3254, 4 is 0.763, 5 is 0.0011 and 0 is 0.1676 • traindx function rating 1 is 0.0014, 2 is 0.1462,3 is 0.0794, 4 is 0.0001, 5 is 0.999 and 0 is 0.1254 • trainoss function rating 1 is 0.0969, 2 is 0.1466,3 is 0.2736, 4 is 0.2351, 5 is 0.1536 and 0 is 0.0943

  7. ANALYSIS • The trainscg training function is the best and it can predict the different dataset to get different probability and get better answer than other training function. • Then we try to change transfer function and the hidden nodes.

  8. Transfer function log purelin tansig 0.0866 1 0.1136 0.0004 1 0.0117 0.9967 1 0.8042 0.01 0 0.084 0.9803 0 0.993 0.9965 0 0.7571 The method of output using tansig in layer1 and logsig in layer2 is the best in this case.

  9. ANALYSIS 15network 5network 1network 0.4112 0 0 0.0853 0.0044 1 0.009 0.0885 0 0.1748 0.7549 0 0.577 0.6614 0.0012 0.0001 0.9623 0.4846 By the result ,the more nodes don’t ensure a better solution. The less also don’t ensure a poor one , but there is a best number of nodes in this case.

  10. We think the data sets range are too wide. We only get  some of the  training datasets to train. • It can have some error to predict and it is difficult to learn.

More Related