1 / 14

Machine Learning Contest

Machine Learning Contest. Team2 정하용 , JC Bazin( 제이시 ), 강찬구. Contents. Introduction Machine Learning Algorithms Neural Network Support Vector Machine Maximum Entropy Model Feature Selection Voting Conclusion. Introduction.

golda
Download Presentation

Machine Learning Contest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Contest Team2 정하용, JC Bazin(제이시), 강찬구

  2. Contents • Introduction • Machine Learning Algorithms • Neural Network • Support Vector Machine • Maximum Entropy Model • Feature Selection • Voting • Conclusion

  3. Introduction • In this team project, we were asked to develop a program which predicts whether a person's income is greater than 50K per year or not. • The objective of contest • Good result..!!

  4. Machine Learning Algorithms • For this project, we used 3 different learning algorithms : • Neural Network • Support Vector Machine • Maximum Entropy Model

  5. Neural network • the MATLAB NN toolbox using the feed-forward back-propagation algorithm. • Format Transformation • only numbers are allowed • transforms the data into integers assigning to a value its position in the given list of attribute. • E.g. : Race : (White, Asian-Pac-Islander, Amer-Indian-Eskimo, other, Black) 2 will be assign to “Asian-Pac-Islander” • The “? unknown information” has been assigned by -1 • All the others data are positive values

  6. Neural network • Parameters • The number of neurons in the hidden layer may be: • equal to the number of neurons in the input layer (Wierenga et Kluytmans, 1994), • equal to 75% of the number of neurons in the input layer (Venugopal et Baets, 1994), • equal to the root square of the product of the number of neurons in the input and output layers (Shepard, 1990). • The activation function has been chosen by trying the three main important ones: logsig, tansig and purelin. • Result • Precision = 80.34% Configuration of the NN : - number of hidden layer : 1 - number of neurons in the hidden layer : 3 - number of neurons in the input layer : 14 - number of neurons in the output layer : 1 - activation function : tansig and purelin. - epochs : 1000

  7. Support Vector Machine • LIBSVM (Library for Support Vector Machines) • Format Transformation • The format of training and testing data file is: • <label> <index1>:<value1> <index2>:<value2> ... • In order to turn the original data set into the required format. • Translates the original format into a new format • New format does some reordering and mapping (attributes into numbers) • Old format • 50, Private, 138179, Assoc-acdm, 12, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 1902, 40, United-States, >50K • New format (directly applicable to svm-train and svm-predict) • 0 1:50 2:0 3:138179 4:5 5:12 6:0 7:1 8:2 9:0 10:1 11:0 12:1902 13:40 14:0

  8. Support Vector Machine • Parameters • SVM-type = C-SVC • Parameter C = 1 • Kernel function = radial basis function • Degree in kernel function = 3 • Gamma in kernel function = 1/k • Coefficient0 in kernel function = 0 • Epsilon in loss function = 0.1 • Tolerance of termination criterion = 0.001 • Shirinking = 1 • Parameter C of class i to weight*C = 1 • Results • Precision = 76.43%

  9. Maximum Entropy Model • Language : Java • Library: OpenNLP MaxEnt-2.2.0 • Parameters • Gis = 1201 • Iis = 923 • Steepest ascent = 212 • Conjugate gradient (fr) = 74 • Conjugate gradient (prp) = 63 • Limited memory variable metric = 70 • Results • Precision = 81.56%

  10. Cross Validation • Why is it needed? • If we do something to improve performance (Voting, Feature Selection, etc), • How can we know which one is better than other? • How about to train for all training data and test for them? • It’s not sufficient because it contains all answers. • Cross Validation • Setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data.

  11. Feature Selection • If we use features more and more, can we get more high precision? • Some features can help to make a decision, but some features can’t. • Moreover, some features can disturb to make a decision. • If we don’t use such bad features, we can get a better performance and more short training time.

  12. Feature Selection ============================= MEM using only 3rd feature Partial Result: 2448/3200 = 0.765 2445/3200 = 0.7640625 2418/3200 = 0.755625 2453/3200 = 0.7665625 2423/3200 = 0.7571875 2450/3200 = 0.765625 2410/3200 = 0.753125 2445/3200 = 0.7640625 2424/3200 = 0.7575 2422/3200 = 0.756875 Last Result: 24338/32000 = 0.760586268320885 ============================= • Experiments on MEM • Using all features • Precision: 81.56% • Using only 1 feature • Precision: 76.05% (base line) • If we always answer “<=50K”, we can get 76.05%..!! • Using 5 features • Precision: 74.2%, 81.5%, 82.9% ... • Using all features except 3rd feature • Precision: 86.95% (best features) • Improvement: 5.4% • Precision using all training data : 87.32% Baseline : 24338/32000 = 0.7606 All : 26098/32000 = 0.8156 … 11,12,13 : 26582/32000 = 0.8307 4,11,12,13 : 26694/32000 = 0.8342 2,4,7,11,12,13 : 26840/32000 = 0.8388 … 4,6,11,12,13 : 27477/32000 = 0.8587 4,8,11,12,13 : 27491/32000 = 0.8591 … 4,6,8,10,11,12,13 : 27516/32000 = 0.8599 2,4,6,7,8,10,11,12,13 : 27709/32000 = 0.8659 1,2,4,6,7,8,10,11,12,13 : 27788/32000 = 0.8684 … All except 3rd : 27823/32000 = 0.8695

  13. Voting • What do we have to do when different learners give us different results? • Voting by democracy • Weighted voting • Precision of 3 learners • MEM : 27942/32000 = 87.32% • NN : 25708/32000 = 80.34% • SVM : 24458/32000 = 76.43% • Precision of Voting by democracy • 27382/32000 = 85.57%

  14. Conclusion • We can get best result using only MEM • Precision: 27942/32000 = 87.32% • Why? • We couldn’t use best features for NN, SVM • Precision of SVM using best features: 29036/32000 = 90.74% • We didn’t have experiments for voting

More Related