1 / 29

Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers. Sampsa Hautaniemi, DTech Systems Biology Laboratory Institute of Biomedicine Genome-Scale Biology Research Program Centre of Excellence in Cancer Genetics Faculty of Medicine University of Helsinki.

ferris
Download Presentation

Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers Sampsa Hautaniemi, DTech Systems Biology Laboratory Institute of Biomedicine Genome-Scale Biology Research Program Centre of Excellence in Cancer Genetics Faculty of Medicine University of Helsinki

  2. Table of Contents • The essence of systems biology: Iteration and collaboration. • Iteration in ovarian cancer. • The essence of systems biology II: Multi-level data. • Multi-levelity of breast cancer. • The essence of systems biology III: Computation. • Anduril computational framework & glioblastoma multiforme.

  3. Systems Biology: Iteration Adapted from a slide by Peter Sorger

  4. Ovarian Cancer • Epithelial ovarian cancer is the fifth most frequent cause of female cancer deaths, with an overall 5-year survival rate below 50%. • The standard chemotherapy for high-grade serous ovarian cancer (HGS-OvCa) is platinum-taxane combination. • Majority of patients suffer relapse <18 months. • No clinically applicable methods to predict the prognostic outcome or even to identify the patients unresponsive to current therapies.

  5. Aims of the HGS-OvCa Study • To identifypoorresponse and goodresponsesubtypes of HGS-OvCa. • Report biomarkersthatallow to identifywhether a HGS-OvCa patient responds to the platinum treatment. • Wedeveloped a computationalmethodthatintegrates transcriptomics and clinical data in subtypefindingstep. • Weused transcriptomics and clinical data from 184 HGS-OvCa patients treated with platinum and taxanefrom TCGA repository.

  6. Three Subtypes of HGS-OvCa Chen et al. In preparation.

  7. Validation, validation, validation • We also used an independent prospective HGS-OvCa cohort of 29 patients. • Data measured with qRT-PCR. Chen et al. In preparation.

  8. Pathway Analysis • Our pathway analysis (too) identified TR3 as a potential driver for platinum resistance.

  9. TR3 Inhibition with Two Drugs • We identified two signaling pathway regulators for TR3 and associated inhibitors. • The use of two inhibitors should transform the HGS-OvCa cells sensitive to platinum. AKT inh + AKT inh + ERK5 inh Chen et al. In preparation.

  10. eAtlas of Pathology Systems Biology II: Multi-level Data • Whilecancercellsareclearlyvisible the exactmolecularcauses for arestillunknown. • Need to studycancersamples at multiplelevels.

  11. Genetics Transcriptome Proteomics Epigenetics Clinical Multiple Levels of Data 100 samples lead to ~200 million data points.

  12. Multiplelevel data: EstrogenReceptor Nuclear receptor: Estrogen receptor Gene regulation Transcription factor Non-genomic action Genomic action

  13. Why Is This Important? • Estrogen receptor is the most important clinical variable in determining how to treat a breast cancer patient. • There are several anti-cancer drugs targeting estrogen receptor pathway. • Currently unknown which tumors do not response to therapy. • Finding genes respond to estrogen receptor stimulus may give clues which genes are important in ER inhibition resistance. Hugo Simberg: Garden of Death

  14. Data • We used chromatin immunoprecipitation combined with massive parallel sequencing (ChIP-seq) to determine genome-wide occupancy (eight time points) after estradiol stimuli in MCF-7 breast cancer cell line: • Estrogene receptor a • RNA polymerase II • Histone marks (H3K4me3, H2A.Z) • These experiments resulted in >2.0 billion data points to the initial analysis.

  15. SYNERGY database • SYNERGY database is available and fully operational. • http://csblsynergy.fimm.fi/

  16. Finding ER Responsive Genes

  17. Results • We identified 777 estrogen receptor early responding genes. • Interestingly, the major estrogen receptor related changes in cells were due to non-genomic action.

  18. Results • Next we searched for genes that have survival association in a breast cancer cohort of 150 ER+/HER2-/postmenopausal patients in The Cancer Genome Atlas (TCGA) cohort. • Based on Kaplan-Meier analysis we identified 23 genes with survival p<0.05. • The best survival associated gene was ATAD3B.

  19. Kaplan-Meier for ATAD3B

  20. Intermission • Pol2 activity is much better way of searching for responsive genes to a cue that mRNA. • In deep sequencing, the sequencing depth is important (with our 200 mill. short-read Pol2 data, we found many ER responsive genes not found in 20 mill. short-read GRO-seq). • How to systematically analyze multi-level data?

  21. Multi-levelCancerResearchRequiresComputationalMethods • Storing the data and computing power are the first (but relatively small) hurdles. • Analysis of large-scale, heterogeneous data is much more challenging than single genomics or proteomics data analysis. • There is a need for computational infrastructure. • Writing an analysis program fast without proper infrastructure will lead to delays and errors in larger projects.

  22. Infrastructure: Anduril • Anduril is a computational framework to integrate large-scale and heterogeneous data, knowledge in bio-databases and analysis tools. • The main design principles are: • Modular pipeline analysis approach • Scalable • Open source, thorough documentation • http://www.anduril.org/ • Method written in any programming language executable from the command prompt can be included. • Produces automatically the result PDF and website containing the results.

  23. Complex Pipelines Are Fragile

  24. Glioblastoma Multiforme (GBM) • Glioblastomamultiforme (GBM) is one of the deadliest cancers. • The Cancer Genome Atlas (TCGA) has published data from >500 GBM patients: • comparative genomic hybridization arrays • single nucleotide polymorphism arrays • exon and gene expression arrays • microRNA arrays • methylation arrays • clinical data • Which genes or genetic regions have survival effect?

  25. GBM Results in Anduril Website

  26. Latest on moesin in GBM

  27. (Sequence) Component Libraries • Over 400 Anduril components already available. • Pipelines: • ChIP-seq (EMBO J 2011, Cancer Res 2012, ...) • RNA-seq (not published) • miRNA-seq (not published) • DNA methylation-seq (not published) • Whole-genome sequence & exome-sequence (not published) • Image analysis (manuscript)

  28. Summary • Characterization of a complex disease first requires identifying the key variables. • This requires integration data from multiple levels, iterative mode of research and collaboration. • Multi-level data integration requires computational infrastructure and data-intensive computing. • We have developed Anduril to organize large-scale data analysis projects (imaging, deep sequencing, database usage, conversions, etc.) • The need for computational infrastructure is evident in particular when analyzing deep sequencing data. • All our methods are (will be) freely available. http://research.med.helsinki.fi/gsb/hautaniemi/software.html

  29. Acknowledgements Systems Biology Lab Funding Academy of Finland Finnish Cancer Organizations Sigrid Jusélius Foundation EU FP7 ERA-NET SysBio+ Biocenter Finland Biocentrum Helsinki Collaborators Olli Carpén Henk Stunnenberg George Reid Jukka Westermarck

More Related