Machine Learning Contest

Machine Learning Contest Team2 정하용, JC Bazin(제이시), 강찬구

Contents • Introduction • Machine Learning Algorithms • Neural Network • Support Vector Machine • Maximum Entropy Model • Feature Selection • Voting • Conclusion

Introduction • In this team project, we were asked to develop a program which predicts whether a person's income is greater than 50K per year or not. • The objective of contest • Good result..!!

Machine Learning Algorithms • For this project, we used 3 different learning algorithms : • Neural Network • Support Vector Machine • Maximum Entropy Model

Neural network • the MATLAB NN toolbox using the feed-forward back-propagation algorithm. • Format Transformation • only numbers are allowed • transforms the data into integers assigning to a value its position in the given list of attribute. • E.g. : Race : (White, Asian-Pac-Islander, Amer-Indian-Eskimo, other, Black) 2 will be assign to “Asian-Pac-Islander” • The “? unknown information” has been assigned by -1 • All the others data are positive values

Neural network • Parameters • The number of neurons in the hidden layer may be: • equal to the number of neurons in the input layer (Wierenga et Kluytmans, 1994), • equal to 75% of the number of neurons in the input layer (Venugopal et Baets, 1994), • equal to the root square of the product of the number of neurons in the input and output layers (Shepard, 1990). • The activation function has been chosen by trying the three main important ones: logsig, tansig and purelin. • Result • Precision = 80.34% Configuration of the NN : - number of hidden layer : 1 - number of neurons in the hidden layer : 3 - number of neurons in the input layer : 14 - number of neurons in the output layer : 1 - activation function : tansig and purelin. - epochs : 1000

Support Vector Machine • LIBSVM (Library for Support Vector Machines) • Format Transformation • The format of training and testing data file is: • <label> <index1>:<value1> <index2>:<value2> ... • In order to turn the original data set into the required format. • Translates the original format into a new format • New format does some reordering and mapping (attributes into numbers) • Old format • 50, Private, 138179, Assoc-acdm, 12, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 1902, 40, United-States, >50K • New format (directly applicable to svm-train and svm-predict) • 0 1:50 2:0 3:138179 4:5 5:12 6:0 7:1 8:2 9:0 10:1 11:0 12:1902 13:40 14:0

Support Vector Machine • Parameters • SVM-type = C-SVC • Parameter C = 1 • Kernel function = radial basis function • Degree in kernel function = 3 • Gamma in kernel function = 1/k • Coefficient0 in kernel function = 0 • Epsilon in loss function = 0.1 • Tolerance of termination criterion = 0.001 • Shirinking = 1 • Parameter C of class i to weight*C = 1 • Results • Precision = 76.43%

Maximum Entropy Model • Language : Java • Library: OpenNLP MaxEnt-2.2.0 • Parameters • Gis = 1201 • Iis = 923 • Steepest ascent = 212 • Conjugate gradient (fr) = 74 • Conjugate gradient (prp) = 63 • Limited memory variable metric = 70 • Results • Precision = 81.56%

Cross Validation • Why is it needed? • If we do something to improve performance (Voting, Feature Selection, etc), • How can we know which one is better than other? • How about to train for all training data and test for them? • It’s not sufficient because it contains all answers. • Cross Validation • Setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data.

Feature Selection • If we use features more and more, can we get more high precision? • Some features can help to make a decision, but some features can’t. • Moreover, some features can disturb to make a decision. • If we don’t use such bad features, we can get a better performance and more short training time.

Feature Selection ============================= MEM using only 3rd feature Partial Result: 2448/3200 = 0.765 2445/3200 = 0.7640625 2418/3200 = 0.755625 2453/3200 = 0.7665625 2423/3200 = 0.7571875 2450/3200 = 0.765625 2410/3200 = 0.753125 2445/3200 = 0.7640625 2424/3200 = 0.7575 2422/3200 = 0.756875 Last Result: 24338/32000 = 0.760586268320885 ============================= • Experiments on MEM • Using all features • Precision: 81.56% • Using only 1 feature • Precision: 76.05% (base line) • If we always answer “<=50K”, we can get 76.05%..!! • Using 5 features • Precision: 74.2%, 81.5%, 82.9% ... • Using all features except 3rd feature • Precision: 86.95% (best features) • Improvement: 5.4% • Precision using all training data : 87.32% Baseline : 24338/32000 = 0.7606 All : 26098/32000 = 0.8156 … 11,12,13 : 26582/32000 = 0.8307 4,11,12,13 : 26694/32000 = 0.8342 2,4,7,11,12,13 : 26840/32000 = 0.8388 … 4,6,11,12,13 : 27477/32000 = 0.8587 4,8,11,12,13 : 27491/32000 = 0.8591 … 4,6,8,10,11,12,13 : 27516/32000 = 0.8599 2,4,6,7,8,10,11,12,13 : 27709/32000 = 0.8659 1,2,4,6,7,8,10,11,12,13 : 27788/32000 = 0.8684 … All except 3rd : 27823/32000 = 0.8695

Voting • What do we have to do when different learners give us different results? • Voting by democracy • Weighted voting • Precision of 3 learners • MEM : 27942/32000 = 87.32% • NN : 25708/32000 = 80.34% • SVM : 24458/32000 = 76.43% • Precision of Voting by democracy • 27382/32000 = 85.57%

Conclusion • We can get best result using only MEM • Precision: 27942/32000 = 87.32% • Why? • We couldn’t use best features for NN, SVM • Precision of SVM using best features: 29036/32000 = 90.74% • We didn’t have experiments for voting

Machine Learning Contest

Machine Learning Contest

Presentation Transcript

Machine Learning

Machine Learning

MACHINE LEARNING

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning Contest

Machine learning Courses | Machine Learning Training

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn