1 / 2

Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, tmader@student.ethz.ch

SUMMARY. ARCENE: cancer diagnosis 10000 features, 100 training examples, 100 validation examples Very small number of training examples, merging training and validation set improved performance significantly. We empiricaly found that the probe method performed better than signal to noise ratio

jaron
Download Presentation

Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, tmader@student.ethz.ch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SUMMARY • ARCENE: cancer diagnosis • 10000 features, 100 training examples, 100 validation examples • Very small number of training examples, merging training and validation set improved performance significantly. • We empiricaly found that the probe method performed better than signal to noise ratio • Optimal number of features found by crossvalidation: • GISETTE: digit recognition, 5000 features, 6000 training examples, 1000 validation examples • Garbage features were easily removed by feature selection according to signal to noise ratio • Applying gaussian blur to the input images brought significant improvement. • Shrinkage could further improve the results In the course on feature extraction of the winter semester 2005-2006 the students had to participate in the NIPS 2003 feature selection challenge. They experimented with different feature selection and classification methods in order to achieve high rankings. BACKGROUND Feature Selection is a vital part for any classification procedure. Thus it is very important to understand common procedures and algorithms and see how they perform on real world data. The students had the chance to apply the knowledge acquired during class to real world datasets, learning about the strengths and weaknesses of the individual approaches. my_classif=svc({'coef0=1', 'degree=4', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({convolve(’dim1=5’,’dim2=5’), normalize, my_classif}) Submission: my_classif=svc({'coef0=1','degree=3','gamma=0','shrinkage=0.1'}) my_model=chain({probe(relief,{'p_num=2000', ‘f_max=2400'}), normalize, my_classif}) MADELON: artificial data 5000 features, 6000 training examples, 1000 validation examples The relief method performed extremely well: With only 20 features the classification error could be reduced to below 7% Shrinkage proved to be nearly as important as kernel width for the gaussian kernel. • DEXTER: text categorization, 5000 features, 300 training examples, 300 validation examples. • Also very small number of training examples => we merged training and validation set • Initial experiments with principal component analysis showed no improvement • We sticked to the baseline model and optimized the shrinkage and number of features parameter DATASETS • Five Datasets were provided for experiments: • ARCENE: cancer diagnosis • DEXTER: text categorization • DOROTHEA: drug discovery • GISETTE: digit recognition • MADELON: artificial data • All problems were two-class Problems • Our final submission: • Feature selection by signal to noise ratio (1210 features kept) • Normalization of features • Support vector classifier with linear kernel my_classif=svc({'coef0=1', 'degree=0', 'gamma=0.5', 'shrinkage=0.3'}) my_model=chain({probe(relief,{’f_max=20', 'pval_max=0'}), standardize, my_classif}) METHODS The students were provided with a Matlab machine learning package containing a variety of classification and feature extraction algorithms like supportvector machines, random probe method,... The results were ranked according to the balanced error rate (BER), i.e. the error rate of both classes weighted by the number of examples of each class. For this purpose a test set was used which was not available to the students. • DOROTHEA: drug discovery • Size: 5000 features, 300 training examples, 300 validation examples • Our Results indicate: feature selection by TP statistic was already sufficient, the relief criterion didn’t give much improvement. • We optimized the number of features in order to beat the baseline model. CONCLUSIONS Participating in the NIPS 2003 challenge was a very interesting opportunity. It could be clearly seen that in most cases there was no need to use very complicated classifiers, rather simple techniques (when applied correctly) and combined with powerful feature selection methods were faster and more successful. The importance of feature selection was hencee shown very clearly. Only using a fraction of the original features improved classification results and speed significantly. Submission: my_model=chain({TP('f_max=1400’),naive, bias}) Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, tmader@student.ethz.ch RESULTS (BER in %)

More Related