200 likes | 372 Views
Molecular Classification of Cancer. Christopher Davis Mark Fleharty. Introduction. Clinical applications of computational molecular biology Class prediction Class discovery. Topics of Discussion. Acute Leukemia AML ALL DNA Microarrays Data mining methods. Acute Leukemia.
E N D
Molecular Classification of Cancer Christopher Davis Mark Fleharty Class discovery and class prediction by gene expression
Introduction • Clinical applications of computational molecular biology • Class prediction • Class discovery Class discovery and class prediction by gene expression
Topics of Discussion • Acute Leukemia • AML • ALL • DNA Microarrays • Data mining methods Class discovery and class prediction by gene expression
Acute Leukemia • Different types • Acute myeloid leukemia (AML) • Acute lymphoblastic leukemia (ALL) • Importance of correct diagnosis • Maximize efficacy • Minimize toxicity • Morphological vs. Molecular characteristics Class discovery and class prediction by gene expression
DNA Microarrays • Hybridization of mRNA’s onto chips with complementary strands of DNA • What they tell us • How much is a gene expressed • When are genes expressed • Where are genes expressed • Under what conditions are they expressed Class discovery and class prediction by gene expression
Gene Expression Example • mRNA’s are indicator • Yeast – Wine • Anaerobic • Alcohol • Yeast – Bread • Aerobic • CO2 Class discovery and class prediction by gene expression
Data Mining • Correlation Weighting Methods • Self Organizing Maps • K-means • PCA (Principle Component Analysis) Class discovery and class prediction by gene expression
Correlated Weighting Methods • The magnitude of each vote is dependant on the expression level in the new sample and the correlation with the class distinction Class discovery and class prediction by gene expression
Pearson’s “r” Correlation • Continuous interval between –1 and 1 • +1 if 2 genes are correlated perfectly • -1 if 2 genes are correlated negatively • 0 if there is no correlation Class discovery and class prediction by gene expression
Example r = .8 Class discovery and class prediction by gene expression
Idealized AML/ALL Gene Class discovery and class prediction by gene expression
High Correlation With Idealized Gene Class discovery and class prediction by gene expression
Allow genes to “vote” • Sort strongest correlated genes (This list is often informative) • Genes cast weighted votes based on their correlation with the idealized gene and how much they are expressed in the patient • Votes are summed and based on a predetermined threshold the patient is classified as having AML/ALL/Inconclusive • Prediction Strength Class discovery and class prediction by gene expression
Self Organizing Maps • Method for unsupervised learning – reduces high dimensional data to low dimensional data • Based on a grid of artificial neurons • Each grid location has a weight vector Class discovery and class prediction by gene expression
Self Organizing Maps Continued • The node with a weight vector closest to input vector is chosen and it’s weights adjusted closer to the input vector • This node’s neighbors are also adjusted to be closer to the input vector according to some decay function • Process all vectors and repeat until stable Class discovery and class prediction by gene expression
Use SOM to discover classes • SOM is used to find the class members to train the predictors • Predictors are tested on a new set of known classification • If the cross validation is positive and the prediction strength good the cluster discovery and prediction are considered good • Iterate if you want to find finer classes Class discovery and class prediction by gene expression
K-Means • Dataset is partitioned into K clusters randomly • For each data point calculate the distance from the point to the cluster – if it is closer to it’s current cluster leave it there, otherwise move it to the closest cluster • Repeat until stable Class discovery and class prediction by gene expression
Principle Components Analysis • A transform that chooses a new coordinate system for the data set s.t. the greatest variance comes to lie on the first axis(principle component), the 2nd greatest variance on the 2nd axis, etc. • Can be used to reduce dimensionality by eliminating later principle components Class discovery and class prediction by gene expression
What This Means • Diagnostic Tools • Use in diagnosis of other diseases • Look for toxins in environment • Decoding regulatory networks • Use of time sensitive data • Use of stress data • Drug discovery • New classifications of disease Class discovery and class prediction by gene expression
Future Work • Algorithm research • How do we gear experiments to maximize the amount of information we get? Class discovery and class prediction by gene expression