1 / 38

Cancer Genome Anatomy Project (CGAP)

Cancer Genome Anatomy Project (CGAP). Carl Schaefer NCICB Jamboree February 25, 2005. A Little History. CGAP initiated in 1996 goal: profile cell’s passage from normal to malignant method: sequencing ESTs By 1999, three components to the web site: Tumor Gene Index expression variation

kipling
Download Presentation

Cancer Genome Anatomy Project (CGAP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cancer Genome Anatomy Project (CGAP) Carl Schaefer NCICB Jamboree February 25, 2005

  2. A Little History • CGAP initiated in 1996 • goal: profile cell’s passage from normal to malignant • method: sequencing ESTs • By 1999, three components to the web site: • Tumor Gene Index • expression variation • cDNA libraries & clones • Cancer Chromosome Aberration Project • chromosomal structural/copy number variation • BAC clones • Gene Annotation Initiative • nucleotide variation • primers

  3. The 2000 Re-design • 2000: site moved from NCBI to NCI • Re-design • organize around biology instead of projects • tie data more closely together (cross-links) • new sources of expression data • bring in functional information (Gene Ontology and pathway annotations) • Current usage (per month): • 200,000 pages • 8,000 unique visitors • 5 GB data • Who: researchers (molecular biology)

  4. pre-CORE Architecture • Apache/Zope • Oracle (plain vanilla sql) • a few flat files for efficiency • some Python • some C & R (P-values and correlations) • a lot of Perl (>30,000 lines runtime, CGAP-proper) • CGI plus some custom distributed processing

  5. Example Scenario 1 • For two (pools of) biosamples • find a set of differentially expressed genes • then look for functional coherence in the set

  6. Polyak breast cancer SAGE libraries Cancer Cell. 2004 Jul;6(1):17-32

  7. Leucocytes IDC vs. DCIS

  8. Leucocytes A = IDC B = DCIS

  9. IDC > DCIS

  10. Over-Represented Biological Process Terms In 77-gene set IDC > DCIS In 72-gene set DCIS > IDC

  11. Example Scenario 2 • For a gene of interest (e.g. AREG) • find other genes that are similar in some dimension, e.g. expression OMIM: “… amphiregulin (AREG) inhibits growth of certain human tumor cells and stimulates proliferation of human fibroblasts and other normal and tumor cells. … striking homology to the epidermal growth factor (EGF; 131530) family of proteins. Amphiregulin binds to the EGF receptor but not as well as EGF does.”

  12. What looks like AREG, using SAGE data

  13. What looks like AREG, using NCI60 (Novartis) data

  14. The World Has Moved On • Genomic view • Genomic measurement: CGH; DK • Expression measurement: arrays; SAGE • Extra-central-dogma stuff: RNAi • Functional info • Proteins … constantly playing catch-up stay tuned

  15. Pathway Interaction Database

  16. Proteins in Action • Proteins interact with DNA [regulation] • Transcription factors • Proteins interact with proteins [signaling] • Complex formation • Post-translational modifications • Phosphorylation, acetylation • Cleaving off subunits • Proteins interact with small molecules [metabolism] • Proteins move around [translocation] • E.g in/out of the nucleus

  17. Pathways • Interactions aggregate/cooperate in cellular processes, e.g. • Energy production • Cell cycle • Apoptosis • Telomere maintenance • Cell migration • Angiogensis

  18. A metabolic pathway: citrate cycle

  19. A signaling & regulatory pathway: cell cycle (G1/S)

  20. Describing the Biology • Pictures • Vocabularies (e.g. Gene Ontology) • Networks (directed graphs) • But also… • Physical molecular structure • Stoichiometry • Reaction rates • Simulation models This is our interest

  21. Examples of Questions (1 of 2) • What downstream interactions could be affected, directly or indirectly, by a mutation in a particular protein or by a change in the abundance of a particular protein? • How many parallel, independent paths are known to lead to the same event (e.g. activation of a particular protein)? • What anomalies (mutation, over-expression, under-expression) might theoretically result in a failure of the DNA repair mechanism? Might these same anomalies disrupt other processes?

  22. Examples of Questions (2 of 2) • Loss of heterozygosity of a region within 17q21 has been detected in 30% of primary breast tumors. What cellular processes would be most directly affected by a loss of function of genes in this region? Are any candidate genes in the region closely connected to each other in pathway networks? • … questions about cause/effect networks

  23. Prototype Pathway Interaction Database • Collection of interactions • can be composed into networks for pre-defined or novel pathways • Not a collection of graphics • but have tools that support visualization • Not an encyclopedia • Not a proposal for a new controlled vocabulary • At present, no reaction concentrations, rates • Available in two versions • research-oriented: http://cmap.nci.nih.gov/PW • caBIO-like: caPathway (bottom of page http://ncicb.nci.nih.gov/core/caBIO)

  24. Representation (1 of 2) • Pathway: directed graph • node: molecule or event or condition • edge: role of molecule/condition in an event • interaction: event & its connected molecules/conditions • Molecule type: • protein | complex | compound | rna • or families (e.g. EC_2.7.7.15 includes PCYT1A, PCYT1B) • Event type: • reaction | modification | transcription | translocation • or any GO BP type

  25. Representation (2 of 2) • Condition type: • any GO BP type • Role type: • input | output | agent | inhibitor • or any GO MF type • Molecule location • any GO CC type • specified at point of use • Posttranslational modification • abstract terms (e.g. “active”) • specified at point of use

  26. Current Contents (all human) BioCarta KEGG Pathways 259 85 Interactions 3064 4207

  27. Research-Version Software

  28. caPathwayPrimary Domain Model

  29. Some Applications • Find a network that connects a set of molecules • Visualize (via “dot” pgm) a novel or predefined pathway network • Overlay expression information • Propagate values (boolean net interpretation) • Compute consistency of pre-defined networks with expression data • Explore higher-level units of phenotype profiling • Find potentially “active” interactions from expression data

  30. Example • Gene expression data (SAGE) for closely related biosamples: • mutant: SAGE_Brain_glioblastoma_CL_H54+EGFRvIII • control: SAGE_Brain_glioblastoma_CL_H54+LacZ • Reference: Mutant epidermal growth factor receptor up-regulates molecular effectors of tumor invasion. Cancer Res. 2002 Jun 15;62(12):3335-9 • Compute “active” interactions (across the whole signaling network) for each sample • Identify interactions unique to one sample • 419 common • 152 unique to mutant EGFR sample • 237 unique to control

  31. Two interactions unique to the EFGR mutant sample HRAS, an oncogene, is expressed in both libraries, but is predicted to be activated (HRAS+) only in the mutant library

  32. Long Term Plans • Data Sources • Improved curation – exploratory contract with NPG • New data sources: Reactome (via caBIG), BIND • Data Representation • Adopt standard caBIG external representation -- BioPAX • Still to do: cell types; process time/phase • Need to do better: post-translational modifications

  33. Proteomics Technologies Initiative

  34. Goals • Ultimate goal: • find protein biomarkers of cancer • reliable early detection/diagnosis • Immediate questions: • can current proteomics technologies support cancer marker discovery in sera from mouse models? • are mouse models suitable for discovering cancer markers applicable in humans?

  35. The Program • Office of Technology and Industrial Relations (OTIR) • Two consortia • “Western”: Fred Hutchinson Cancer Research Center • PI: Martin McIntosh • “Eastern”: University of Michigan • PI: Samir Hanash • $13.4 M; through SAIC-Frederick; 2 years • Press Release: • http://www.cancer.gov/newscenter/pressreleases/ProteomicBiomarkerAwards

  36. Approaches • Mass spec profiling (MS1) to identify m/z peaks that discriminate normal from disease • Tandem mass spec (MS2) for peptide/protein identification of discriminating peaks • Validation of markers with affinity-based methods (e.g. antibody arrays)

  37. Products • Lots of data; reference proteomics datasets for mouse models • current estimate: 2TB per month • All shared data described in common data elements registered in caDSR and vocabulary registered in EVS • Open-source mass spec analysis pipeline and data repository • NCICB to be the long-term host of the data • While not a caBIG-funded project, will influence/become the caBIG proteomics repository

  38. Acknowledgements • CGAP • Denise Hise • Kotien Wu • Susan Greenhut (CGAP site look) • Pathways • Denise Hise • Elden Santos • Sandhya Xirasagar • Josh Phillips (caPathway) • Sol Efroni

More Related