1 / 32

Modeling Biological Systems and Analyzing Large-Scale Data Sets

Modeling Biological Systems and Analyzing Large-Scale Data Sets. ilya shmulevich. TCGA Data Types. TCGA Research Network. Heterogeneous data. Clinical variables contributing to tumor aggressiveness. Nature , 487,330-337, 2012. Vesteinn Thorsson. FBXW7. Vesteinn Thorsson.

julie
Download Presentation

Modeling Biological Systems and Analyzing Large-Scale Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Biological Systems and Analyzing Large-Scale Data Sets ilya shmulevich

  2. TCGA Data Types

  3. TCGA Research Network

  4. Heterogeneous data

  5. Clinical variables contributing to tumor aggressiveness Nature, 487,330-337, 2012. VesteinnThorsson

  6. FBXW7 VesteinnThorsson

  7. Nature, 487,330-337, 2012. VesteinnThorsson

  8. Nature, 487,330-337, 2012. VesteinnThorsson, Dick Kreisberg

  9. Web-based Apps http://explorer.cancerregulome.org

  10. The Regulome Explorer is an interactive web application that allows the user to explore multivariate relationships in data explorer.cancerregulome.org Richard Kreisberg, Jake Lin, TimoErkkila, Sheila Reynolds

  11. explorer.cancerregulome.org

  12. RF-ACE, a multivariate statistical inference method based on ensembles of decision trees, which seeks to uncover significant associations between features in the input data matrix. TimoErkkilä, Sheila Reynolds, Kari Torkkola

  13. RF-ACE has high predictive power and is resistant to over-fitting. • Computational challenges: • mixed data types: continuous, discrete, and categorical • tens of thousands of features x tens or hundreds of samples • non-linear, noisy, and multivariate relationships • correlated features • missing data http://code.google.com/p/rf-ace/ • RF-ACE features: • handles mixed variable types • does not require imputation of missing values • random subsampling rather than combinatorial search • statistical testing removes redundant features • “importance” p-value for each candidate predictor • fast, portable implementation in C++ TimoErkkilä

  14. Google I/O keynote presentation June 27, 2012 600,000 cores

  15. A multilevel pan-cancer view: from genes to hallmarks Theo Knijnenburg

  16. Mutational investment

  17. Billions of Associations! explorer.cancerregulome.org

  18. Motivating questions • Repurposing • Which existing cancer drugs may be therapeutic in which other cancers? • Which inhibitors with no current cancer indications may be therapeutic in certain cancers? • Opportunity • TCGA primary tumor data may serve as the basis for guided investigation of these open questions

  19. Guiding principle • The direct protein target for most inhibitors is not the sensitizing aberrated protein itself • e.g., AKT1 inhibitors are most effective against cell lines with PTEN mutations Song et al. (2012)

  20. Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA PTEN mutations in UCEC genespot.org AKT1 protein expression related to PTEN mutation in UCEC cancerregulome.org AKT1 RPPA protein expression PTEN mutation status

  21. Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA Association drug target : sensitizing aberration pairs

  22. Approach Synthetic lethal protein targets Candidate compounds … ATR • Create large heterogeneous graph of associations from TCGA data, literature, databases, … • [Billions of edges, Terabytes of data] • Query on Cray YarcDatauRiKA graph analytics appliance • No locality of reference, graphs hard to partition • [Minutes rather than hours per query] • Identify aberrated gene → target → drug relationships for drugs with and without known efficacy in cancer CHEK1 Genomic Aberration PAK3 TP53 mutation PLK1 SGK2 … WEE1

  23. Integrating multiple data sources into a (big) graph Genomic aberrations Therapeutic targets Candidate inhibitors RNAi

  24. Graph Data Model:Resource Description Framework (RDF) 1 0.2515 <http://www.systemsbiology.net/nmd#combocount> <http://www.systemsbiology.net/nmd#nmd> _:blankGeneNMD <http://www.systemsbiology.net/nmd#term2> <http://www.systemsbiology.net/nmd#term1> http://www.systemsbiology.net/brca2> http://www.systemsbiology.net/tp53> <http://www.systemsbiology.net/feature#label> <http://www.systemsbiology.net/nmd#term1> _:blankDrugGeneNMD <http://www.systemsbiology.net/tp53y_n_somatic> <http://www.systemsbiology.net/feature#source> <http://www.systemsbiology.net/nmd#term2> <http://www.systemsbiology.net/feature#source> http://www.systemsbiology.net/biotin> 1.1628 <http://www.systemsbiology.net/association#feature1> gnab _:blankPairwise <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.systemsbiology.net/association#dataset> <http://www.systemsbiology.net/nmd#nmd> http://www.systemsbiology.net/Drug> <http://www.systemsbiology.net/association#feature2> <http://www.systemsbiology.net/gata3gexp> brca_05nov <http://www.systemsbiology.net/association#correlation> <http://www.systemsbiology.net/feature#source> -0.511 gexp

  25. Example SPARQL Query Literature Literature Seed Gene List Small Molecules Associated Genes Cancer Type TCGA cancer.gov approved drugs Database

  26. Example Result: PTEN associations in UCEC Genomic aberrations Candidate targets Candidate inhibitors Acepromazine Acitretin Adapalene Adenine Adenosine monophosphate Adenosine triphosphate Adinazolam Alfuzosin Alitretinoin Allylestrenol Alpha-Linolenic Acid Alprazolam Alteplase Aminocaproic Acid Amiodarone Amitriptyline Amoxapine Amsacrine Anistreplase Aprotinin Arcitumomab Aripiprazole Astemizole Atomoxetine Atorvastatin Bepridil Biotin Bromazepam Bromocriptine Bupropion Caffeine Capromab Carglumic acid Carmustine Carvedilol Chlordiazepoxide Chlorotrianisene Chlorpheniramine Chlorpromazine Cinolazepam Cisapride Clobazam Clomifene Clomipramine Clonazepam Clorazepate Clotiazepam Clozapine Cocaine Conjugated Estrogens Cysteamine Danazol Dantrolene Dapiprazole Debrisoquin Desipramine Desogestrel Desvenlafaxine Dexmethylphenidate Dextroamphetamine Diazepam Dicumarol Dienestrol Diethylpropion Diethylstilbestrol Dipyridamole Dofetilide Doxazosin Doxepin Drospirenone Droxidopa Duloxetine Dutasteride Dydrogesterone Ephedra Ephedrine Epinephrine Ergotamine Escitalopram Estazolam Estradiol Estramustine Estriol Estrone Estropipate Ethinyl Estradiol EthynodiolDiacetate Etonogestrel Felodipine Finasteride Fludiazepam Fluoxymesterone Flurazepam Fluticasone Propionate Fluvastatin Fulvestrant Gabapentin Galsulfase Ginkgo biloba Glutathione Glycine Guanadrel Sulfate Guanethidine Halazepam Halofantrine Hydroxocobalamin Ibutilide Idursulfase Imipramine Isoproterenol Isradipine Ketazolam Labetalol L-Alanine L-Arginine L-Asparagine L-Aspartic Acid L-Carnitine L-Citrulline L-Cysteine Levonordefrin Levonorgestrel L-Glutamic Acid L-Histidine Lindane Lisdexamfetamine L-Methionine Lorazepam L-Ornithine Lovastatin L-Proline L-Serine Maprotiline Mazindol Medroxyprogesterone Megestrol Melatonin Menadione Meperidine Mestranol Methamphetamine Methotrimeprazine Methoxamine Methylphenidate Mianserin Miconazole Midazolam Midodrine Mifepristone Milnacipran Modafinil N-Acetyl-D-glucosamine NADH Naloxone Nefazodone Nicardipine Nitrazepam Nitrendipine Norelgestromin Norepinephrine Norethindrone Norgestimate Nortriptyline Olanzapine Olopatadine Orphenadrine Oxazepam Paliperidone Paroxetine Pentostatin Pentoxifylline Pergolide Phendimetrazine Phenmetrazine Phentermine Phenylephrine Phosphatidylserine Pimozide Pravastatin Prazepam Prazosin Progesterone Promazine Propafenone Propericiazine Propiomazine Protriptyline Pseudoephedrine Pyridoxine Quazepam Quetiapine Quinestrol Quinidine Raloxifene Reboxetine Reteplase Risperidone Rosuvastatin S-Adenosylmethionine Sertindole Sibutramine Simvastatin Sotalol Streptokinase Suramin Tamoxifen Tamsulosin Tazarotene Temazepam Tenecteplase Terazosin Terfenadine Tetracycline Thiopental Thioproperazine Thioridazine Toremifene Tramadol Tranexamic Acid Tretinoin Triazolam Trilostane Trimipramine Urokinase Venlafaxine Verapamil Vitamin A Xylometazoline Ziprasidone Zonisamide ASRGL1 ESR1 GLYATL2 PLIN3 HADH NT5E PIK3R3 GABRE PGR FBP1 SMPD3 GRIN1 PIK3R1 RARG AADAT CACNA2D2 SST SRD5A1 B4GALT1 ADRA1B KCNJ12 RYR1 SLC6A14 RETSAT FAAH SRR NQO1 CEACAM1 KCNK6 ACADS CRAT ELOVL4 FOLH1 ALDH1A3 SORD ASS1 NADSYN1 PRNP NDUFA11 KCNH2 CPS1 SLC22A5 HMGCR ALDH18A1 PARS2 GLS B4GALT4 ACACB SLC38A3 GSR OAZ3 TCN1 SLC1A1 SMPD4 BHMT2 HSD17B4 GRIK5 GLDC PPIB PIPOX ADA SCN3B S100A1 PLG SLC1A4 CBS GLRB ACVR1B SLC6A2 PTEN

  27. Example Result: PTEN associations in UCEC Genomic aberrations Candidate targets Candidate inhibitors PTEN PIK3R1/PIK3CA Wortmannin PIK3R1 gene expression PTEN mutation status PDB id 3hhm

  28. Repurposing existing cancer drugs in other cancers Genomic aberrations Candidate targets Candidate inhibitors Existing cancer indication Target Cancer Drug A New cancer indication

  29. Example Result • TP53 is frequently mutated in most tumor types • ABCG2, also known as Breast Cancer Resistance Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data • Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins • High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. “It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. “ [Shim et al. JNCI 2012]

  30. Understanding behavior of massive multicellular systems: BioCellion Source: http://www.sjrcd.org/soilhealth/soilagg.html Source: http://www.theregister.co.uk source: EMBO Rep. 2004 May; 5(5): 470–476. Ductal Carcinoma model: Nicholas Flann, Utah State Univ. Source: http://www.webmd.com

  31. Acknowledgments Brady Bernard, Ryan Bressler, Andrea Eakin, TimoErkkilä, Lisa Iype, Seunghwa Kang, Theo Knijnenburg, Roger Kramer, Richard Kreisberg, KalleLeinonen, Jake Lin, Yuexin Liu, Michael Miller, Sheila Reynolds, Hector Rovira, VesteinnThorsson, Da Yang, Wei Zhang

More Related