Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA 02139, USA. golub@genome.wi.mit.edu

Contents • Background; • Objective; • Methods; • Results; • Conclusion.

Background: Cancer Classification • Cancer classification is central to cancer treatment; • Traditional cancer classification methods: by sites: ICD-9/10; by morphology: ICD-O etc; • Limitations of morphology classification: tumors of similar histopathological appearance can have significantly different clinical courses and response to therapy; • Further subdivision of morphologically similar tumors can be made at molecular level; • Traditionally cancer classification relied on specific biological insights,rather than on systematic and unbiased approaches;

Background: Cancer Classification (Continued) • Cancer classification can be divided into two challenges: class discovery and class prediction. • Class discovery refers to definingpreviously unrecognized tumor subtypes. • Class prediction refersto the assignment of particular tumor samples to already-definedclasses.

Background: Leukemia • Acute leukemia: variability in clinicaloutcome and subtle differences in nuclear morphology • Subtypes: acute lymphoblastic leukemia (ALL) or acute myeloid leukemia (AML); • ALL subcategories: T-lineage ALL and B-lineage ALL; • Particular subtypes of acute leukemia have beenfound to be associated with specific chromosomal translocations; • No single test is currently sufficient to establishthe diagnosis, but a combination of different tests in morphology,histochemistry and immunophenotyping etc. ; • Althoughusually accurate, leukemia classification remains imperfect anderrors do occur;

Objective • To develop a more systematic approach to cancer classification based on the simultaneous expression monitoringof thousands of genes using DNA microarrays with leukemia as test cases;

Method: Biological Samples • Primary samples: 38 bone marrow samples (27 ALL, 11 AML) obtained from acute leukemia patients atthe time of diagnosis; • Independent samples: 34 leukemiasamples (24 bone marrow and 10 peripheralblood samples);

Method: Microarray • RNA prepared from cells was hybridized to high-density oligonucleotide Affymetrix microarrays containing probes for6817 human genes; • Samples were subjected to a prioriquality control standards regarding the amount of labeled RNAand the quality of the scanned microarray image.

Statistical Method: • “Neighborhood analysis" (Fig.1A): Briefly, one defines an "idealized expression pattern" correspondingto a gene that is uniformly high in one class and uniformly lowin the other. One tests whether there is an unusually high densityof genes "nearby" (or similar to) this idealized pattern,as compared to equivalent random patterns.

Statistical Methods (Continued) Development of class predictor • Uses a fixedsubset of "informative genes" chosen based on their correlationwith class distinction and makes a prediction on the basisof the expression level of these genes in a new sample; • Each informativegene casts a "weighted vote" for one of the classes, with themagnitude of each vote dependent on the expression level in thenew sample and the degree of that gene's correlation with theclass distinction (Fig. 1B); • The votes were summed to determine the winning class, as wellas a "prediction strength" (PS), which is a measure of the marginof victory that ranges from 0 to 1; • The samplewas assigned to the winning class if PS exceeded a predeterminedthreshold, and was otherwise considered uncertain. On the basisof previous analysis, a threshold of 0.3 was used.

Statistical Methods (continued) Validity testing of class predictors • Two-step procedure: (1). The accuracy of the predictorswas first tested by cross-validation on the initial data set.Briefly, one withholds a sample, builds a predictor based onlyon the remaining samples, and predicts the class of the withheldsample. The process is repeated for each sample, and the cumulativeerror rate is calculated; (2). One then builds a final predictor basedon the initial data set and assesses its accuracy on an independentset of samples.

Statistical Methods (continued) Clustering methods for class discovery • Self-organizing maps (SOMs) technique: The user specifies the numberof clusters to be identified. The SOM finds an optimal set of"centroids" around which the data points appear to aggregate.It then partitions the data set, with each centroid defining acluster consisting of the data points nearest to it.

Results Class prediction: (1) Whether there were genes whose expression pattern was strongly correlated with the class distinctionto be predicted? • For the 38 acute leukemia samples, neighborhood analysis showed that roughly 1100 genes were more highly correlated with theAML-ALL class distinction than would be expected by chance (Fig. 2). This suggested that classification could indeedbe based on expression data.

Results (2). How to use a collection of known samples to create a "class predictor" capable of assigning a new sampleto one of two classes? • A set of informative genes to be used in the predictor was chosento be the 50 genes most closely correlated with AML-ALL distinctionin the known samples.

Results (3). How to test the validity of class predictors? • Cross-validation tests: The 50-gene predictor assigned 36 of the 38 samples as either AML or ALL and the remainingtwo as uncertain (PS < 0.3). All 36 predictionsagreed with the patients' clinical diagnosis; • Independent test: The 50-gene predictor was applied to an independent collection of 34 leukemiasamples. The predictor made assigned 29 of the 34 samples, and the accuracy was 100%; • Prediction strength: medianPS = 0.77 in cross-validation and 0.73 in independent test (Fig. 3A).

Results (3). How to test the validity of class predictors (continued)? • The average prediction strength was lower for samples from onelaboratory that used a very different protocol for sample preparation; should standardize of sample preparation in clinical implementation.

Results (4). How many genes should be included for class predictor? • The choice to use 50 informative genes in the predictor was somewhat arbitrary: well within the total numberof genes strongly correlated with the class distinction;seemed large enough to be robust against noise, andsmall enough to be readily applied in a clinical setting. • The results were insensitive to the particular choice:Predictors based on 10-200 genes were all found tobe 100% accurate, reflecting the strong correlation of genes withthe AML-ALL distinction.

Results (5). The list of informative genes used in the AML versus ALL predictor was highly instructive (Fig. 3B). • Some genes, includingCD11c, CD33, and MB-1, encode cell surface proteins useful in distinguishinglymphoid from myeloid lineage cells. • Others providenew markers of acute leukemia subtype. For example, the leptinreceptor, originally identified through its role in weight regulation,showed high relative expression in AML. • Together, these data suggest that genes useful for cancer classprediction may also provide insight into cancer pathogenesis andpharmacology.

Results (6). The methodology of class prediction can be applied to any measurable distinction among tumors. Importantly, such distinctionscould concern a future clinical outcome. • Ability to predict responseto chemotherapy: among the 15 adult AML patients who had been treatedand for whom long-termclinical follow-up was available. No evidence of a strong multigene expression signature was correlatedwith clinical outcome, although this could reflect the relativelysmall sample size.

Results Class discovery • Ifthe AML-ALL distinction was not already known, could it hasbeen discovered simply on the basis of gene expression?

Results Two cluster analysis (1). Cluster tumors by gene expression: • A two-cluster SOM was applied to automatically group the 38 initial leukemia samples into two classes on the basis of the expressionpattern of all 6817 genes.

Results (2). Determine whetherputative classes produced are meaningful. • The clusters were first evaluatedby comparing them to the known AML-ALL classes (Fig. 4A). Class A1 containedmostly ALL (24 of 25 samples) and class A2 contained mostly AML(10 of 13 samples). The SOM was thus quite effective at automatically discovering the two types of leukemia.

Results • How one could evaluate such putative clusters if the "right" answer were not already known? Class discovery could be tested by class prediction; If putativeclasses reflect true structure, then a class predictor based onthese classes should perform well.

Results • To test this hypothesis, the clusters A1 and A2 were evaluated: (a). We constructed predictors to assign new samples as "type A1"or "type A2."

Result (b). Cross-validation: • Predictors that used a wide range of different numbersof informative genes performed well; • The cross-validation thus not only showed high accuracy,but actually refined the SOM-defined classes except for the subset of samples accurately classified;

Results (c). Independent test: • The median PS was 0.61, and74% of samples were above threshold (Fig. 4B). Highprediction strengths indicate that the structure seen in the initialdata set is also seen in the independent data set.

Results (d). Same analyses with random clusters: Such clusters consistently yielded predictors with poor accuracyin cross-validation and low prediction strength on the independentdata set (Fig. 4B). • On the basis of such analysis,the A1-A2 distinction can be seen to be meaningful, rather thansimply a statistical artifact of the initial data set. The resultsthus show that the AML-ALL distinction could have been automaticallydiscovered and confirmed without previous biological knowledge.

Results Multiple cluster analysis (1). SOM divides thesamples into four clusters, which largely corresponded to AML, T-lineage ALL, B-lineageALL, and B-lineage ALL, respectively (Fig. 4C). The four-clusterSOM thus divided the samples along another key biological distinction. (2) Evaluated these classes by constructing class predictors. The four classes could be distinguishedfrom one another, with the exception of B3 versus B4 (Fig. 4D).

Results Multiple cluster analysis (continued) • The prediction tests thus confirmed the distinctions correspondingto AML, B-ALL, and T-ALL, and suggested that it may be appropriateto merge classes B3 and B4, composed primarily of B-lineage ALL.

Conclusion Class Prediction • Described techniques for class prediction, whereby samples can be automatically assigned to already-recognized classes; • These class predictors could be adapted to a clinical setting,with appropriate steps to standardize the protocol for samplepreparation. • Such a test supplementing rather thanreplacing existing leukemia diagnostics;

Conclusion Class Prediction (continued): • Class predictors can be constructed for knownpathological categories and provide diagnostic confirmationor clarify unusual cases. • The technique of class prediction can be applied to distinctions relating to future clinical outcome, suchas drug response or survival. • Class prediction provides an unbiased,general approach to constructing such prognostic tests.

Conclusion Class Discovery • In principle, the class discovery techniques discovered here can be used to identify fundamental subtypes of any cancer. • In general,such studies will require careful experimental design to avoidpotential experimental artifacts--especially in the case of solidtumors.

Conclusion Class Discovery (continued) • Various approaches could be used to avoid such artifacts; • Class discovery methods could also be used to search for fundamental mechanisms that cut across distinct types of cancers.

Contents

Contents

Presentation Transcript

Contents

Contents

Contents

Contents

Contents

Contents

Contents

CONTENTS

Contents

Contents