10 likes | 108 Views
University of Zurich. Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li. Department of Informatics.
E N D
University ofZurich Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li Department of Informatics The NIPS 2003 challenge was organized to find feature extraction algorithms that can improve learning machines’ performance on five well-chosen datasets. In this project, we address that, by using single filter methods, learning performance could match or outperform the performances of the best entries in NIPS 2003. We mainly used TP, S2N and Relief as filter methods, and support vector machine as the classifier. The experiment results did show that these simple combination was sufficient to get a space dimensionality reduction comparable to what the winners obtained and keep the similar learning performance as the winner’s at the same time. Support Vector Machine Filter Methods Support Vector Machines are learning machines that can perform binary classification and regression estimation tasks.Two results make this approach successful: - A function only on support vectors. - non-linearly map n- dimensional input space into a high dimensional feature space. T-statistic compares the means of the two Gaussian distributions in projection on a given variable. The T-statistic has a Student distribution with m1+m2-2 degrees of freedom, where m1 and m2 are the number of examples of each distribution, and s1 and s2 the estimation of the standard deviations of the distributions obtained from the available examples. S2N coefficient measures the ratio of the "signal" (the difference between the mean values of the two classes), and the "noise" (the within class standard deviation). Relief are based on the feature weighting, estimating how well the value of a given feature helps to distinguish between instances that are near to each other. For a randomly selected sample x, two nearest neighbors, (xd from the same class, and xs from a different class), are found. Then, the Relief relevance Index for x is increased by a small amount proportional to the difference |X(x)−X(xd)| and is decreased by a small amount Proportional to |X(x)−X(xs)|. After a large number of iterations this index captures local correlations between feature values and their ability to help in discrimination of vectors from different classes. NISP 2003 Dataset ARCENE: Cancer Diagnosis - feature:10000 - training data:100 - validation data:100 - test data:700 GISETTE: digit recognition - feature: 5000 - training data: 6000 - validation data: 1000 - test data: 6500 DEXTER: text categorization - feature: 20000 - training data: 300 - validation data: 300 - test data: 2000 DOROTHEA: drug discovery - feature: 100000 - training data: 800 - validation data: 350 - test data: 800 MADELON: artificial data - feature: 500 - training data: 2000 - validation data: 600 - test data: 1800 Experiment Methods ARCENE: my_svc=svc({'coef0=2', 'degree=3', 'gamma=0', 'shrinkage=0.1'}); my_mode =chain({relief('f_max=1400'), normalize, my_svc}) :with 5 cross-validation. GISETTE: my_svc=svc({'coef0=1', 'degree=5', 'gamma=0', 'shrinkage=1'}); my_mode =chain({normalize, match_filter(pca_bank(‘f_max=50')), my_svc}); :with 5 cross-validation DEXTER: my_svc=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({s2n('f_max=4500'), normalize, my_svc}); :with 5 cross-validation DOROTHEA: my_model=chain({TP(‘f_max=15000’), normailze, relief(‘f_max=700”), naïve, bias}); MADELON: my_svc=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'}); my_model=chain({probe(relief,{'p_num=2000', ‘f_max=15'}), standardize, my_svc}) :with 10 cross-validation Experiment Results