140 likes | 151 Views
This outline provides an introduction to the P-tree technology, followed by our approach using PKNN/LSVM for gene expression analysis. We discuss podium KNN, weight optimization, improving accuracy by LSVM, and conduct a performance study.
E N D
PKNN/LSVM Approach for Mircroarray Gene Expression Analysis (P-trees technology is patented by NDSU)
OUTLINE • Introduction • P-tree Technology • Our Approach • PODIUM KNN • Weight Optimization • Improving Accuracy by LSVM • Performance Study • Conclusion
PODIUM KNN • Dissimilarity measurement: F(X,Y)=wid(xi,yi) where d(xi,yi)= |xi-yi|, manhattan distance Stage1. finding neighborsStage2. Podium votes
Optimizing Weights • Genetic algorithm, as introduced by Goldberg (1989), is randomized search and optimization techniques that is capable of searching for optimal solutions. • Step1. Partition weight space • Step2. Evaluation/Selection: 10-fold cross validation 1010 1110 … 1010 1010 1110 … 1010 eval 1010 1110 … 1010 1010 1110 … 1010 …
1010 1110 … 1010 rep 1010 1110 … 1010 1010 1110 … 1010 mut 1010 1110 … 1010 1010 1110 … 1010 Optimizing Weights (cont.) • Step3. Reproduction • Step4. Mutation • Step5. Go back to step2 till reaching stop conditions.
Class 1 Optimal boundary Optimal margin Class 2 Optimal Knn/LSVM • Why LSVM: A lesson from KddCup02
Optimal Knn/LSVM (cont.) • EIN-ring membership • C: component • R: radius • Support vector pair • Boundary Sentry • Boundary hyper plane + + + + + + + + + + + + + - + - * - + + - - - # * - - - - - - - - - - - Step1. finding support vector pairsStep2. fitting boundary hyper plane
Class 1 Optimal boundary Optimal margin Class 2 Optimal Knn/LSVM (cont.) • Robust for Data Set with Noise
data DCI Model GA Model Basic P-trees w1,w2,…,wd Cuboids Model gw1,…,(wi,wj),…,gwk Sorting w.t. avg(gw) HOBBit/EINring EINring Formulation Excution using PDM PDM Model Implementation • Models Structure Design
Data Sets of Bioinformatics • DS1. Leukemia data, size 6817x72, (http://llmpp.nih.gov/lymphoma/) • DS2. Colon cancer data (Alon 1999),size 2000x62 • DS3. NCI60, size 1376x60 • DS4. Yeast sporulation data set (Chu et al. 1998). Time series data. http://cmgm.stanford.edu/pbrown/sporulation/.
Performance Study • Accuracy Comparision
Performance Study (cont.) • Influence of noise
Performance Study (cont.) • Influence of GA parameters