1 / 23

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. T.R. Golub et al., Science 286, 531 (1999). Introduction. Why is Identification of Cancer Class (tumor sub-type) important?

cordelia
Download Presentation

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Molecular Classification of Cancer:Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)

  2. Introduction • Why is Identification of Cancer Class (tumor sub-type) important? • Cancers of Identical grade can have widely variable clinical courses (i.e. acute lymphoblastic leukemia, or Acute myeloid leukemia). • Tradition Method: • Morphological appearance. • Enzyme-based histochemical analyses. • Immunophenotyping. • Cytogenetic analysis.

  3. Topics of Discussion • Class Prediction (supervised learning). • Class Discovery (unsupervised learning).

  4. Class Prediction • How could one use an initial collection of samples belonging to know classes to create a class Predictor? • Identification of Informative Genes via Neighborhood Analysis. • Weighted Vote

  5. Neighborhood Analysis • Why do we want to start with informative genes? • To be readily applied in a clinical setting. • Highly instructive

  6. Neighborhood Analysis • v(g) = (e1, e2, ..., en) • c = (c1, c2, ..., cn) • Compute the correlation between v(g) and c. • Euclidean distance • Pearson correlation coefficient. • P(g,c) = [µ1(g)  - µ2(g)]/[ σ1(g) + σ2(g)]

  7. Neighborhood Analysis

  8. Class Predictor via Gene Voting • Parameters (ag, bg) are defined for each informative gene • ag = P(g,c) • bg = [µ1(g) + µ2(g)]/2 • vg = ag(xg - bg) • V1 = ∑ | Vg |; for Vg > 0 • V2 = ∑ | Vg |; for Vg < 0 • PS = (Vwin - Vlose)/(Vwin + Vlose) • The sample was assigned to the winning class for PS > threshold.

  9. Class Predictor via Gene Voting

  10. Data • Initial Sample: 38 Bone Marrow Samples (27 ALL, 11 AML) obtained at the time of diagnosis. • Independent Sample: 34 leukemia consisted of 24 bone marrow and 10 peripheral blood samples (20 ALL and 14 AML).

  11. Neighborhood Analysis

  12. Validation of Gene Voting • Initial Samples: 36 of the 38 samples as either AML or ALL and two as uncertain. All 36 samples agrees with clinical diagnosis. • Independent Samples: 29 of 34 samples are strongly predicted with 100% accuracy.

  13. Validation of Gene Voting

  14. Class Discovery • Can cancer classes be discovered automatically based on gene expression? • Cluster tumors by gene expression • Determine whether the putative classes produced are meaningful.

  15. Cluster tumors • Self-organization Map (SOM) • Mathematical cluster analysis for recognizing and clasifying feautres in complex, multidimensional data (similar to K-mean approach) • Chooses a geometry of “nodes” • Nodes are mapped into K-dimensional space, initially at random. • Iteratively adjust the nodes.

  16. Adjusting the nodes • Randomly select a data point P. • Move the nodes in the direction of P. • The closest node Np is moved the most. • Other nodes are moved depending on their distance from Np in the initial geometry.

  17. SOM

  18. Validation of SOM • Prediction based on cluster A1 and A2: • 24/25 of the ALL samples from initial dataset were clustered in group A1 • 10/13 of the AML samples from initial dataset were clustered in group A2

  19. Validation of SOM • How could one evaluate the putative cluster if the “right” answer were not known? • Assumption: class discovery could be tested by class prediction. • Testing of Assumption: • Construct Predictors based on clusters A1 and A2. • Construct Predictors based on random clusters

  20. Validation of SOM • Predictions using predictors based on clusters A1 and A2 yields 34 accurate predictions, one error and three uncertains.

  21. Validation of SOM

  22. Searching for Finder Class • Use SOM to divide the initial samples into four clusters (denoted B1 to B4) • B1 corresponds to AML, B2 corresponds to T-lineage ALL, B3 and B4 corresponds to B-lineage ALL.

More Related