TCGA Glioblastoma Pilot Project: Initial Report June 2008

TCGA Glioblastoma Pilot Project: Initial Report June 2008

Genomic Alterations copy-number alterations translocations gene expression coding mutations methylation NCI-NHGRI Partnership Cancer Genomics: Ultimate Goal Create comprehensive public catalog of all genomic alterations present at significant frequency for all major cancer types NCAB Report Feb 2005 Integrate

NCI-NHGRI Partnership • TCGA Pilot Project Launched, 4Q2006 • Cancers: • Glioblastoma multiforme • Squamous cell lung cancer • Ovarian cancer (serous cystadenomacarcinoma) • Goal: • Assemble high-quality samples of each type • • Characterize tumor genome by various approaches • • Rapidlyshare data with scientific community • • Compareand improve technologies • • Integrate and analyze data to illuminate genetic basis of cancers

Key questions posed at start of project Can samples of adequate quality and quantity be assembled? Can high-quality, high-throughput data be generated with current platforms? How sensitive, specific and comparable are current platforms? How can diverse data sets be integrated -- and what can be learned from integration? Can recurrent events be distinguished from random background noise? Can we identify new genes associated with cancer types? Can we identify new subtypes of cancer? Does new knowledge suggest therapeutic implications? Can a network project drive technology progress in cancer?

TCGA Components

GBM Samples: Strict Criteria (1) >80% tumor cell content (2) <40% necrosis (3) Matched normal DNA sample (4) Clinical annotation • Appropriate informed consent Current collection > 200 high quality tumors

Cross-platform Data Integration, Comparison TCGA GBM: Center Overview Glioblastoma samples Broad/ DFCI Harvard LBNL JHU/USC Stanford UNC Sequencing Broad, WU, Baylor MSKCC SNP 6.0 Copy Number aCGH Copy Number Exon Array RNA Expression aCGH Copy Number GoldenGate Methylation Infinium Copy Number HTA RNA Expression 2 color arrays RNA, miRNA Expression PCR >ABI Somatic Mutations

Cancer genomes are complex Individual cancer genomes Integrate across many samples New analytical methods needed

Presentations Cameron Brennan GBM and Genome Characterization Stephen BaylinThe GBM Epigenome Rick Wilson Identifying mutations in GBM and application of next gen sequencing technologies Charles PerouThe Challenge of Integrative Analysis Eric Lander Summary and Discussion

Statistical significance 208 97 26 24 144 27 Many fewer events Highly concordant Includes all known genes Copy-number alterations: Discordance in initial studies Event counting n=178 n=37 n=141 Little overlap in early studies Include only some known genes

Copy-number alterations: ~30 in GBM Deletions Amplifications

Total across S samples total bases Coding mutations: Assessing statistical significance Random mutations: 2 x 10-6 per base 0.3% per typical coding region Cancer-related mutations: Aim to detect 3-5% per typical coding region

Significantly mutated genes in GBM600 genes x 86 GBM (non-hypermutated)

ProNeural Normal-like EGFR Mesenchymal Integrated analysis defines four subtypes in GBM • Copy-number alteration • RNA Expression • DNA sequencing • Methylation

Pathway Analysis in GBM EGFR ERBB2 PDGFRA MET mutation in 7% amplification In 13% amplification in 4% mutation, amplification in 45% mutation, homozygous deletion in 17% mutation, homozygous deletion in 36% NF-1 RAS PI-3KClass I PTEN mutation in 2% mutation in 15% AKT amplification in 2% RTK/RAS/PI-3Ksignaling network 86% Proliferation Survival Translation FOXO mutation in 2% P53signaling86% CDKN2A (P16/INK4A) CDKN2B CDKN2C Activated oncogenes homozygous deletion in 51% homozygous deletion in 47% homozygous deletion in 2% homozygousdeletion in 49% CDKN2A (ARF) amplification in 17% amplification in 1% CDK4 CCND2 CDK6 amplification in 14% MDM2 amplification in 2% MDM4 RB1 amplification in 6% homozygous deletion, mutation in 11% TP53 RB signaling77% mutation, homozygous deletion in 35% G1/S progression Senescence Apoptosis

Next-generation sequencing technology ABI 3730XL 454 Solexa, ABI SOLiD + others Read length Read number Total bases 30+bp reads ~40 M / run 8-12 Gb ~200bp 400 K / run 400 Mb ~700 bp ~100/run 70 K Costs decreasing rapidly

Key questions posed at start of project Can samples of adequate quality and quantity be assembled? Can high-quality, high-throughput data be generated with current platforms? How sensitive, specific and comparable are current platforms? How can diverse data sets be integrated -- and what can be learned from integration? Can recurrent events be distinguished from random background noise? Can we identify new genes associated with cancer types? Can we identify new subtypes of cancer? Does new knowledge suggest therapeutic implications? Can a network project drive technology progress in cancer?

TCGA Glioblastoma Pilot Project: Initial Report June 2008