1 / 10

The Cancer Genome Atlas Project

The Cancer Genome Atlas Project. January 24, 2008. Program. Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) Pilot project $100M (NCI/NHGRI) 3 years 3 diseases brain (glioblastoma multiforme) lung (squamous) ovarian (serous cystadenocarcinoma ).

zpetersen
Download Presentation

The Cancer Genome Atlas Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cancer Genome Atlas Project January 24, 2008

  2. Program • Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) • Pilot project • $100M (NCI/NHGRI) • 3 years • 3 diseases • brain (glioblastoma multiforme) • lung (squamous) • ovarian (serous cystadenocarcinoma )

  3. Organization • Biospecimen Core Resource (BCR) • Genome Sequencing Centers (GSCs) (3) • Cancer Genome Characterization Centers (CGCCs) (7) • Data Coordinating Center (DCC) • Project Team (NCI/NHGRI) • Steering Committee (NCI/NHGRI & PIs) • External Scientific Committee • Working Groups

  4. PI’s

  5. URLs • project site: http://cancergenome.nih.gov • gforge: http://gforge.nci.nih.gov (search for TCGA) • data: http://tcga-data.nci.nih.gov • portal: http://tcga-portal.nci.nih.gov [coming]

  6. Data Types

  7. Data Levels • raw • low-level data for a single sample, not normalized (e.g., trace file, .cel file) • processed • single-sample, normalized & interpreted (e.g. mutation call, amplification call for a locus, .snp, .chp) • segmented (n/a for mutation & expression) • single-sample, aggregation of loci into regions (e.g. amplification call for a region of a sample) • summary finding (aka “region of interest”) • cross-sample findings (e.g. minimal common region of amplification across a sample set)

  8. Flow • BCR • check pathology, quality/quantity • extract analytes • prepare data file Tissue Source (MD Anderson, Henry Ford, …) sample data DNA DNA, mRNA CGCC WGA “tracking database” DCC GSC Bulk Download caTissue Core caArray caIntegrator NCBI Trace Archive

  9. Data Formats • BCR • XML (tags are CDEs) • images • GSC • Called mutations (Genboree LFF format) • Linking table • sample-trace-target • CGCC • MAGE-TAB • IDF: Investigation Definition Format • SDRF: Sample and Data Relationship Format

  10. Where Does/Will the Data Go? • ftp site (now with a simple web wrapper: “portal #1”) • “tracking database” • repositories with caBIG API’s • caArray • caTissue CORE • caIntegrator • NCIA • NCBI trace archive • a richer, “portal #2” • more convenient download capability • filtering datasets by clinical information • summary level data • genome browser view • gene info page • visualization on pathways • etc.

More Related