250 likes | 383 Views
Biology-Driven Clustering of Microarray Data. K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee. Applications to the NCI60 Data Set. Introduction. Microarray data is more than a large, unstructured matrix.
E N D
Biology-Driven Clustering of Microarray Data K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee Applications to the NCI60 Data Set
Introduction • Microarray data is more than a large, unstructured matrix. • We already know many genes important for studying cancer through their involvement in specific biological processes • We also know that reproducible chromosomal abnormalities play an important role in cancer • Need analytical methods that use biological information early
Methods • First, updated the annotations of the genes on the microarray • Performed separate analyses • using genes on individual chromosomes • using genes involved in different biological processes • Developed ways to assess how well each set of genes classified samples
Quality of Annotations • Problem: • I.M.A.G.E. clone IDs and GenBank accession numbers are archival • UniGene clusters, gene names, descriptions, functions, etc., are changeable • Solution: • Download latest UniGene (build 137) and LocusLink to update annotations
How many genes on the array have good annotations? Only trust the 7478 spots where the UniGene clusters match.
How do we determine the functions of genes? • UniGene -> LocusLink -> GeneOntology • GeneOntology is a structured, hierarchical vocabulary to describe gene functions in three broad areas: • biological process (why) • molecular function (what) • cellular component (where)
Data Preprocessing • Remove spots with poor annotations and spots with median intensity below the 97th percentile of empty spots. • Normalize each array so median log ratio between channels is one • Center each gene so mean log ratio across experiments is zero • Use (1-correlation)/2 as distance metric
How well does a set of genes distinguish types of cancer? • Three methods for assessment: • Qualitative (PCA, MDS) • Quantitative (PCA + ANOVA) • Semi-quantitative (Grading Dendrograms)
0.6 0.4 0.2 0.0 ovarian.4 ovarian.3 ovarian.5 cns.u251 ovarian.8 nsclc.h23 cns.sf539 cns.sf268 cns.sf295 renal.tk10 cns.snb75 cns.snb19 nsclc.ekvx colon.ht29 renal.a498 renal.786o renal.uo31 renal.achn renal.caki1 nsclc.h460 nsclc.h522 nsclc.h322 nsclc.a549 nsclc.h226 breast.t47d colon.hct15 colon.km12 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop92 nsclc.hop62 prostate.pc3 colon.sw620 breast.bt549 breast.mdan colon.hct116 breast.hs578t leukemia.hl60 colon.colo205 ovarian.skov3 ovarian.igrov1 leukemia.k562 colon.hcc2998 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb231 breast.mdamb435 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m How good is a dendrogram? • A = cluster contains all and only one kind of cancer • B = all, with extras • C = all except one • D = all except one, with extras • E = all except two • F = all except two, with extras
Heterogeneity of different types of cancer • Some cancers (colon, leukemia) are fairly easy to distinguish from others • Some (breast, lung) are so heterogeneous as to be almost impossible to distinguish • Some chromosomes (1, 2, 6, 7, 9, 12, 17) can distinguish many cancers. • Some (16, 21) are essentially random
0.6 0.4 0.2 0.0 cns.u251 ovarian.8 ovarian.5 ovarian.3 ovarian.4 nsclc.h23 cns.sf268 cns.sf295 cns.sf539 renal.tk10 cns.snb19 cns.snb75 colon.ht29 nsclc.ekvx renal.uo31 renal.achn renal.a498 renal.786o nsclc.h460 nsclc.h226 renal.caki1 nsclc.h522 nsclc.h322 nsclc.a549 breast.t47d colon.km12 colon.hct15 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop92 nsclc.hop62 prostate.pc3 colon.sw620 breast.bt549 breast.mdan colon.hct116 breast.hs578t leukemia.hl60 colon.colo205 ovarian.skov3 ovarian.igrov1 leukemia.k562 colon.hcc2998 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb231 breast.mdamb435 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
0.6 0.4 0.2 0.0 ovarian.8 cns.u251 ovarian.3 ovarian.5 ovarian.4 nsclc.h23 cns.sf295 cns.sf268 cns.sf539 renal.tk10 cns.snb19 cns.snb75 colon.ht29 nsclc.ekvx renal.786o renal.achn renal.a498 renal.uo31 nsclc.h460 nsclc.h226 renal.caki1 nsclc.h322 nsclc.a549 nsclc.h522 breast.t47d colon.hct15 colon.km12 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop62 nsclc.hop92 colon.sw620 prostate.pc3 breast.bt549 breast.mdan colon.hct116 breast.hs578t colon.colo205 leukemia.hl60 ovarian.skov3 ovarian.igrov1 colon.hcc2998 leukemia.k562 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb435 breast.mdamb231 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
Can cancers be distinguished by genes of one function? • Table for functional categories looks a lot like the table for chromosomes • Some biological process categories (signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer • Others (apoptosis, energy pathways) cannot
0.6 0.4 0.2 0.0 cns.u251 ovarian.8 ovarian.4 ovarian.5 ovarian.3 nsclc.h23 cns.sf539 cns.sf268 cns.sf295 renal.tk10 cns.snb75 cns.snb19 nsclc.ekvx colon.ht29 renal.786o renal.achn renal.uo31 renal.a498 nsclc.a549 nsclc.h322 nsclc.h226 renal.caki1 nsclc.h460 nsclc.h522 breast.t47d colon.km12 colon.hct15 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop92 nsclc.hop62 prostate.pc3 colon.sw620 breast.bt549 breast.mdan colon.hct116 breast.hs578t colon.colo205 leukemia.hl60 ovarian.skov3 ovarian.igrov1 colon.hcc2998 leukemia.k562 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb231 breast.mdamb435 melanoma.skmel5 melanoma.skmel2 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
0.6 0.4 0.2 0.0 ovarian.4 ovarian.5 ovarian.3 ovarian.8 cns.u251 nsclc.h23 cns.sf539 cns.sf295 cns.sf268 renal.tk10 cns.snb75 cns.snb19 colon.ht29 nsclc.ekvx renal.a498 renal.786o renal.achn renal.uo31 nsclc.h522 renal.caki1 nsclc.h322 nsclc.a549 nsclc.h460 nsclc.h226 breast.t47d colon.km12 colon.hct15 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop62 nsclc.hop92 colon.sw620 prostate.pc3 breast.bt549 breast.mdan colon.hct116 breast.hs578t leukemia.hl60 colon.colo205 ovarian.skov3 ovarian.igrov1 leukemia.k562 colon.hcc2998 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb435 breast.mdamb231 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
0.6 0.4 0.2 0.0 ovarian.3 ovarian.5 cns.u251 ovarian.4 ovarian.8 nsclc.h23 cns.sf295 cns.sf539 cns.sf268 renal.tk10 cns.snb19 cns.snb75 colon.ht29 nsclc.ekvx renal.uo31 renal.a498 renal.786o renal.achn nsclc.h522 nsclc.a549 nsclc.h460 nsclc.h322 renal.caki1 nsclc.h226 breast.t47d colon.km12 colon.hct15 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop62 nsclc.hop92 colon.sw620 prostate.pc3 breast.bt549 breast.mdan colon.hct116 breast.hs578t colon.colo205 leukemia.hl60 ovarian.skov3 ovarian.igrov1 colon.hcc2998 leukemia.k562 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb231 breast.mdamb435 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
0.8 0.6 0.4 0.2 0.0 ovarian.5 ovarian.3 cns.u251 ovarian.4 ovarian.8 nsclc.h23 cns.sf295 cns.sf539 cns.sf268 renal.tk10 cns.snb75 cns.snb19 nsclc.ekvx colon.ht29 renal.786o renal.uo31 renal.achn renal.a498 nsclc.h322 nsclc.h226 nsclc.h522 nsclc.a549 nsclc.h460 renal.caki1 breast.t47d colon.hct15 colon.km12 renal.sn12c breast.mcf7 renal.rxf393 nsclc.hop92 nsclc.hop62 colon.sw620 prostate.pc3 breast.bt549 breast.mdan colon.hct116 breast.hs578t colon.colo205 leukemia.hl60 ovarian.skov3 ovarian.igrov1 leukemia.k562 colon.hcc2998 prostate.du145 leukemia.molt4 melanoma.m14 breast.unknown leukemia.ccrfcem melanoma.loximvi leukemia.srcl7019 breast.mdamb231 breast.mdamb435 melanoma.skmel2 melanoma.skmel5 melanoma.uacc62 leukemia.rpmi8226 melanoma.skmel28 melanoma.uacc577 melanoma.malme3m
Conclusions (I) • Multiple views into the data provide substantial insight into differences in cancer types and gene sets. • Cancer types differ greatly in their degree of heterogeneity, ranging from homogeneous (colon, leukemia) through moderately heterogeneous (renal, melanoma) to extremely heterogeneous (breast and lung).
Conclusions (II) • Homogeneous cancers exhibit strong identifying signals across most views of the data. • There are large difference in the ability of genes of different chromosomes or involved in different biological processes to distinguish cancer types.
Supplementary Material Complete results of each analysis by chromosome and by function are available no our web site: http://www.mdanderson.org /depts/cancergenomics