140 likes | 350 Views
Machine Learning Contest. Team2 정하용 , JC Bazin( 제이시 ), 강찬구. Contents. Introduction Machine Learning Algorithms Neural Network Support Vector Machine Maximum Entropy Model Feature Selection Voting Conclusion. Introduction.
E N D
Machine Learning Contest Team2 정하용, JC Bazin(제이시), 강찬구
Contents • Introduction • Machine Learning Algorithms • Neural Network • Support Vector Machine • Maximum Entropy Model • Feature Selection • Voting • Conclusion
Introduction • In this team project, we were asked to develop a program which predicts whether a person's income is greater than 50K per year or not. • The objective of contest • Good result..!!
Machine Learning Algorithms • For this project, we used 3 different learning algorithms : • Neural Network • Support Vector Machine • Maximum Entropy Model
Neural network • the MATLAB NN toolbox using the feed-forward back-propagation algorithm. • Format Transformation • only numbers are allowed • transforms the data into integers assigning to a value its position in the given list of attribute. • E.g. : Race : (White, Asian-Pac-Islander, Amer-Indian-Eskimo, other, Black) 2 will be assign to “Asian-Pac-Islander” • The “? unknown information” has been assigned by -1 • All the others data are positive values
Neural network • Parameters • The number of neurons in the hidden layer may be: • equal to the number of neurons in the input layer (Wierenga et Kluytmans, 1994), • equal to 75% of the number of neurons in the input layer (Venugopal et Baets, 1994), • equal to the root square of the product of the number of neurons in the input and output layers (Shepard, 1990). • The activation function has been chosen by trying the three main important ones: logsig, tansig and purelin. • Result • Precision = 80.34% Configuration of the NN : - number of hidden layer : 1 - number of neurons in the hidden layer : 3 - number of neurons in the input layer : 14 - number of neurons in the output layer : 1 - activation function : tansig and purelin. - epochs : 1000
Support Vector Machine • LIBSVM (Library for Support Vector Machines) • Format Transformation • The format of training and testing data file is: • <label> <index1>:<value1> <index2>:<value2> ... • In order to turn the original data set into the required format. • Translates the original format into a new format • New format does some reordering and mapping (attributes into numbers) • Old format • 50, Private, 138179, Assoc-acdm, 12, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 1902, 40, United-States, >50K • New format (directly applicable to svm-train and svm-predict) • 0 1:50 2:0 3:138179 4:5 5:12 6:0 7:1 8:2 9:0 10:1 11:0 12:1902 13:40 14:0
Support Vector Machine • Parameters • SVM-type = C-SVC • Parameter C = 1 • Kernel function = radial basis function • Degree in kernel function = 3 • Gamma in kernel function = 1/k • Coefficient0 in kernel function = 0 • Epsilon in loss function = 0.1 • Tolerance of termination criterion = 0.001 • Shirinking = 1 • Parameter C of class i to weight*C = 1 • Results • Precision = 76.43%
Maximum Entropy Model • Language : Java • Library: OpenNLP MaxEnt-2.2.0 • Parameters • Gis = 1201 • Iis = 923 • Steepest ascent = 212 • Conjugate gradient (fr) = 74 • Conjugate gradient (prp) = 63 • Limited memory variable metric = 70 • Results • Precision = 81.56%
Cross Validation • Why is it needed? • If we do something to improve performance (Voting, Feature Selection, etc), • How can we know which one is better than other? • How about to train for all training data and test for them? • It’s not sufficient because it contains all answers. • Cross Validation • Setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data.
Feature Selection • If we use features more and more, can we get more high precision? • Some features can help to make a decision, but some features can’t. • Moreover, some features can disturb to make a decision. • If we don’t use such bad features, we can get a better performance and more short training time.
Feature Selection ============================= MEM using only 3rd feature Partial Result: 2448/3200 = 0.765 2445/3200 = 0.7640625 2418/3200 = 0.755625 2453/3200 = 0.7665625 2423/3200 = 0.7571875 2450/3200 = 0.765625 2410/3200 = 0.753125 2445/3200 = 0.7640625 2424/3200 = 0.7575 2422/3200 = 0.756875 Last Result: 24338/32000 = 0.760586268320885 ============================= • Experiments on MEM • Using all features • Precision: 81.56% • Using only 1 feature • Precision: 76.05% (base line) • If we always answer “<=50K”, we can get 76.05%..!! • Using 5 features • Precision: 74.2%, 81.5%, 82.9% ... • Using all features except 3rd feature • Precision: 86.95% (best features) • Improvement: 5.4% • Precision using all training data : 87.32% Baseline : 24338/32000 = 0.7606 All : 26098/32000 = 0.8156 … 11,12,13 : 26582/32000 = 0.8307 4,11,12,13 : 26694/32000 = 0.8342 2,4,7,11,12,13 : 26840/32000 = 0.8388 … 4,6,11,12,13 : 27477/32000 = 0.8587 4,8,11,12,13 : 27491/32000 = 0.8591 … 4,6,8,10,11,12,13 : 27516/32000 = 0.8599 2,4,6,7,8,10,11,12,13 : 27709/32000 = 0.8659 1,2,4,6,7,8,10,11,12,13 : 27788/32000 = 0.8684 … All except 3rd : 27823/32000 = 0.8695
Voting • What do we have to do when different learners give us different results? • Voting by democracy • Weighted voting • Precision of 3 learners • MEM : 27942/32000 = 87.32% • NN : 25708/32000 = 80.34% • SVM : 24458/32000 = 76.43% • Precision of Voting by democracy • 27382/32000 = 85.57%
Conclusion • We can get best result using only MEM • Precision: 27942/32000 = 87.32% • Why? • We couldn’t use best features for NN, SVM • Precision of SVM using best features: 29036/32000 = 90.74% • We didn’t have experiments for voting