200 likes | 291 Views
Bioinformatics Challenge. Learning in very high dimensions with very few samples. Colon cancer dataset: 2000 # of gene vs. 62 samples. Acute leukemia dataset: 7129 # of gene vs. 72 samples. Feature selection will be needed. Feature Selection Approach. Filter model
E N D
Bioinformatics Challenge • Learning in very high dimensions with very few samples • Colon cancer dataset: • 2000 # of gene vs. 62 samples • Acute leukemia dataset: 7129 # of gene vs. 72 samples • Feature selection will be needed
Feature Selection Approach • Filter model • Weight score approach • Wrapper model • 1-norm SVM • IRSVM
Feature Selection –Filter Model Using Weight Score Approach Feature 1 Feature 2 Feature 3
Filter Model –Weight Score Approach Weight score: where and are the mean and standard deviation of feature for training examples of positive or negative class.
Selecting genes with largest as our top features. • The weight score is calculated with the information about a single feature. Filter Model –Weight Score Approach • is defined as the ratio between the difference of the means of expression levels and the sum of standard deviation in two classes. • The highly linear correlated features might be selected by this approach.
1-Norm SVM (Different Measure of Margin) 1-Norm SVM: Equivalent to: Good for feature selection!
mean of area, standard error of area, worst area, worst texture, worst perimeter and tumor size • 6 out of 31 features selected by alinear SVM( ) Clustering Process:Feature Selection & Initial Cluster Centers
(i) Choose a random subset matrix of entire data matrix gives lousy results! Using Reduced Support Vector Machine Nonlinear Classifier: (ii) Solve the following problem by theNewton’s method min (iii) The nonlinear classifier is defined by the optimal solution in step(ii):
Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance? • Is there a mechanism to determine the size of reduced set automatically or dynamically? Reduced Set: Plays the Most Important Role in RSVM • It is natural to raise two questions:
Reduced Set SelectionAccording to the Data Scatter in Input Space • Choose reduced set randomly but only keep the points in the reduced set that are more than a certain minimal distance apart • Expected these points to be representative sample
12 3 11 10 1 2 7 8 5 4 9 6 A Better WayAccording to the Data Scatter in Feature Space • An example is given as following : Training data analogous to XORproblem
Mapping to Feature Space • Map the input data via nonlinear mapping: • Equivalent to polynomial kernel with degree 2:
36 25 14 12 3 11 10 1 2 9 12 7 10 8 11 7 8 5 4 9 6 Data Points in the Feature Space
12 3 11 10 1 2 7 8 5 4 9 6 Experiment Result
In RSVMs, the nonlinear separating surface • In SVMs, the nonlinear separating surface is: is a linear combination of a set of kernel functions • If the kernel functions are very similar, the hypothesis space spanned by this kernel functions will be very limited. Mathematical ObservationsAnother Reason for IRSVM
Start with a very small reduced set , then add a new data point only when the kernel vector is dissimilar to the current function set • This point contributes the most extra information • Repeat until several successive points cannot be for generating the separating surface added Incremental Reduced SVMsThe strength of weak ties • The strength of weak ties (….)
It has a unique solution , and the distance is • The criterion for adding a point into reduced set is • The distance from the kernel vector to the • This distance can be determined by solving a column space of is greater than a threshold least squares problem How to measure the dissimilar? Solving Least Squares Problems
IRSVM Algorithm pseudo-code(sequential version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set 4 Computes its kernel vector 5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix 7 If its distance exceed a certain threshold 8 Add this point into the reduced set and form the new reduced kernel matrix 9 Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface
Wrapper Model – IRSVMFind a Linear Classifier: Randomly choose a very small feature subset from the input features as the initial feature reduced set. Select a feature vector not in the current feature reduced set and computing the distance between this vector and the space spanned by current feature reduced set. If the distance is larger than a given gap, then we add this feature vector to the feature reduced set. Repeat step II and step III until there are no feature can be added to the current feature reduced set. Features in the resulting feature reduced set is our final result of feature selection.