100 likes | 123 Views
The Cancer Genome Atlas Project. January 24, 2008. Program. Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) Pilot project $100M (NCI/NHGRI) 3 years 3 diseases brain (glioblastoma multiforme) lung (squamous) ovarian (serous cystadenocarcinoma ).
E N D
The Cancer Genome Atlas Project January 24, 2008
Program • Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) • Pilot project • $100M (NCI/NHGRI) • 3 years • 3 diseases • brain (glioblastoma multiforme) • lung (squamous) • ovarian (serous cystadenocarcinoma )
Organization • Biospecimen Core Resource (BCR) • Genome Sequencing Centers (GSCs) (3) • Cancer Genome Characterization Centers (CGCCs) (7) • Data Coordinating Center (DCC) • Project Team (NCI/NHGRI) • Steering Committee (NCI/NHGRI & PIs) • External Scientific Committee • Working Groups
URLs • project site: http://cancergenome.nih.gov • gforge: http://gforge.nci.nih.gov (search for TCGA) • data: http://tcga-data.nci.nih.gov • portal: http://tcga-portal.nci.nih.gov [coming]
Data Levels • raw • low-level data for a single sample, not normalized (e.g., trace file, .cel file) • processed • single-sample, normalized & interpreted (e.g. mutation call, amplification call for a locus, .snp, .chp) • segmented (n/a for mutation & expression) • single-sample, aggregation of loci into regions (e.g. amplification call for a region of a sample) • summary finding (aka “region of interest”) • cross-sample findings (e.g. minimal common region of amplification across a sample set)
Flow • BCR • check pathology, quality/quantity • extract analytes • prepare data file Tissue Source (MD Anderson, Henry Ford, …) sample data DNA DNA, mRNA CGCC WGA “tracking database” DCC GSC Bulk Download caTissue Core caArray caIntegrator NCBI Trace Archive
Data Formats • BCR • XML (tags are CDEs) • images • GSC • Called mutations (Genboree LFF format) • Linking table • sample-trace-target • CGCC • MAGE-TAB • IDF: Investigation Definition Format • SDRF: Sample and Data Relationship Format
Where Does/Will the Data Go? • ftp site (now with a simple web wrapper: “portal #1”) • “tracking database” • repositories with caBIG API’s • caArray • caTissue CORE • caIntegrator • NCIA • NCBI trace archive • a richer, “portal #2” • more convenient download capability • filtering datasets by clinical information • summary level data • genome browser view • gene info page • visualization on pathways • etc.