320 likes | 508 Views
Modeling Biological Systems and Analyzing Large-Scale Data Sets. ilya shmulevich. TCGA Data Types. TCGA Research Network. Heterogeneous data. Clinical variables contributing to tumor aggressiveness. Nature , 487,330-337, 2012. Vesteinn Thorsson. FBXW7. Vesteinn Thorsson.
E N D
Modeling Biological Systems and Analyzing Large-Scale Data Sets ilya shmulevich
Clinical variables contributing to tumor aggressiveness Nature, 487,330-337, 2012. VesteinnThorsson
FBXW7 VesteinnThorsson
Nature, 487,330-337, 2012. VesteinnThorsson
Nature, 487,330-337, 2012. VesteinnThorsson, Dick Kreisberg
Web-based Apps http://explorer.cancerregulome.org
The Regulome Explorer is an interactive web application that allows the user to explore multivariate relationships in data explorer.cancerregulome.org Richard Kreisberg, Jake Lin, TimoErkkila, Sheila Reynolds
RF-ACE, a multivariate statistical inference method based on ensembles of decision trees, which seeks to uncover significant associations between features in the input data matrix. TimoErkkilä, Sheila Reynolds, Kari Torkkola
RF-ACE has high predictive power and is resistant to over-fitting. • Computational challenges: • mixed data types: continuous, discrete, and categorical • tens of thousands of features x tens or hundreds of samples • non-linear, noisy, and multivariate relationships • correlated features • missing data http://code.google.com/p/rf-ace/ • RF-ACE features: • handles mixed variable types • does not require imputation of missing values • random subsampling rather than combinatorial search • statistical testing removes redundant features • “importance” p-value for each candidate predictor • fast, portable implementation in C++ TimoErkkilä
Google I/O keynote presentation June 27, 2012 600,000 cores
A multilevel pan-cancer view: from genes to hallmarks Theo Knijnenburg
Billions of Associations! explorer.cancerregulome.org
Motivating questions • Repurposing • Which existing cancer drugs may be therapeutic in which other cancers? • Which inhibitors with no current cancer indications may be therapeutic in certain cancers? • Opportunity • TCGA primary tumor data may serve as the basis for guided investigation of these open questions
Guiding principle • The direct protein target for most inhibitors is not the sensitizing aberrated protein itself • e.g., AKT1 inhibitors are most effective against cell lines with PTEN mutations Song et al. (2012)
Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA PTEN mutations in UCEC genespot.org AKT1 protein expression related to PTEN mutation in UCEC cancerregulome.org AKT1 RPPA protein expression PTEN mutation status
Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA Association drug target : sensitizing aberration pairs
Approach Synthetic lethal protein targets Candidate compounds … ATR • Create large heterogeneous graph of associations from TCGA data, literature, databases, … • [Billions of edges, Terabytes of data] • Query on Cray YarcDatauRiKA graph analytics appliance • No locality of reference, graphs hard to partition • [Minutes rather than hours per query] • Identify aberrated gene → target → drug relationships for drugs with and without known efficacy in cancer CHEK1 Genomic Aberration PAK3 TP53 mutation PLK1 SGK2 … WEE1
Integrating multiple data sources into a (big) graph Genomic aberrations Therapeutic targets Candidate inhibitors RNAi
Graph Data Model:Resource Description Framework (RDF) 1 0.2515 <http://www.systemsbiology.net/nmd#combocount> <http://www.systemsbiology.net/nmd#nmd> _:blankGeneNMD <http://www.systemsbiology.net/nmd#term2> <http://www.systemsbiology.net/nmd#term1> http://www.systemsbiology.net/brca2> http://www.systemsbiology.net/tp53> <http://www.systemsbiology.net/feature#label> <http://www.systemsbiology.net/nmd#term1> _:blankDrugGeneNMD <http://www.systemsbiology.net/tp53y_n_somatic> <http://www.systemsbiology.net/feature#source> <http://www.systemsbiology.net/nmd#term2> <http://www.systemsbiology.net/feature#source> http://www.systemsbiology.net/biotin> 1.1628 <http://www.systemsbiology.net/association#feature1> gnab _:blankPairwise <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.systemsbiology.net/association#dataset> <http://www.systemsbiology.net/nmd#nmd> http://www.systemsbiology.net/Drug> <http://www.systemsbiology.net/association#feature2> <http://www.systemsbiology.net/gata3gexp> brca_05nov <http://www.systemsbiology.net/association#correlation> <http://www.systemsbiology.net/feature#source> -0.511 gexp
Example SPARQL Query Literature Literature Seed Gene List Small Molecules Associated Genes Cancer Type TCGA cancer.gov approved drugs Database
Example Result: PTEN associations in UCEC Genomic aberrations Candidate targets Candidate inhibitors Acepromazine Acitretin Adapalene Adenine Adenosine monophosphate Adenosine triphosphate Adinazolam Alfuzosin Alitretinoin Allylestrenol Alpha-Linolenic Acid Alprazolam Alteplase Aminocaproic Acid Amiodarone Amitriptyline Amoxapine Amsacrine Anistreplase Aprotinin Arcitumomab Aripiprazole Astemizole Atomoxetine Atorvastatin Bepridil Biotin Bromazepam Bromocriptine Bupropion Caffeine Capromab Carglumic acid Carmustine Carvedilol Chlordiazepoxide Chlorotrianisene Chlorpheniramine Chlorpromazine Cinolazepam Cisapride Clobazam Clomifene Clomipramine Clonazepam Clorazepate Clotiazepam Clozapine Cocaine Conjugated Estrogens Cysteamine Danazol Dantrolene Dapiprazole Debrisoquin Desipramine Desogestrel Desvenlafaxine Dexmethylphenidate Dextroamphetamine Diazepam Dicumarol Dienestrol Diethylpropion Diethylstilbestrol Dipyridamole Dofetilide Doxazosin Doxepin Drospirenone Droxidopa Duloxetine Dutasteride Dydrogesterone Ephedra Ephedrine Epinephrine Ergotamine Escitalopram Estazolam Estradiol Estramustine Estriol Estrone Estropipate Ethinyl Estradiol EthynodiolDiacetate Etonogestrel Felodipine Finasteride Fludiazepam Fluoxymesterone Flurazepam Fluticasone Propionate Fluvastatin Fulvestrant Gabapentin Galsulfase Ginkgo biloba Glutathione Glycine Guanadrel Sulfate Guanethidine Halazepam Halofantrine Hydroxocobalamin Ibutilide Idursulfase Imipramine Isoproterenol Isradipine Ketazolam Labetalol L-Alanine L-Arginine L-Asparagine L-Aspartic Acid L-Carnitine L-Citrulline L-Cysteine Levonordefrin Levonorgestrel L-Glutamic Acid L-Histidine Lindane Lisdexamfetamine L-Methionine Lorazepam L-Ornithine Lovastatin L-Proline L-Serine Maprotiline Mazindol Medroxyprogesterone Megestrol Melatonin Menadione Meperidine Mestranol Methamphetamine Methotrimeprazine Methoxamine Methylphenidate Mianserin Miconazole Midazolam Midodrine Mifepristone Milnacipran Modafinil N-Acetyl-D-glucosamine NADH Naloxone Nefazodone Nicardipine Nitrazepam Nitrendipine Norelgestromin Norepinephrine Norethindrone Norgestimate Nortriptyline Olanzapine Olopatadine Orphenadrine Oxazepam Paliperidone Paroxetine Pentostatin Pentoxifylline Pergolide Phendimetrazine Phenmetrazine Phentermine Phenylephrine Phosphatidylserine Pimozide Pravastatin Prazepam Prazosin Progesterone Promazine Propafenone Propericiazine Propiomazine Protriptyline Pseudoephedrine Pyridoxine Quazepam Quetiapine Quinestrol Quinidine Raloxifene Reboxetine Reteplase Risperidone Rosuvastatin S-Adenosylmethionine Sertindole Sibutramine Simvastatin Sotalol Streptokinase Suramin Tamoxifen Tamsulosin Tazarotene Temazepam Tenecteplase Terazosin Terfenadine Tetracycline Thiopental Thioproperazine Thioridazine Toremifene Tramadol Tranexamic Acid Tretinoin Triazolam Trilostane Trimipramine Urokinase Venlafaxine Verapamil Vitamin A Xylometazoline Ziprasidone Zonisamide ASRGL1 ESR1 GLYATL2 PLIN3 HADH NT5E PIK3R3 GABRE PGR FBP1 SMPD3 GRIN1 PIK3R1 RARG AADAT CACNA2D2 SST SRD5A1 B4GALT1 ADRA1B KCNJ12 RYR1 SLC6A14 RETSAT FAAH SRR NQO1 CEACAM1 KCNK6 ACADS CRAT ELOVL4 FOLH1 ALDH1A3 SORD ASS1 NADSYN1 PRNP NDUFA11 KCNH2 CPS1 SLC22A5 HMGCR ALDH18A1 PARS2 GLS B4GALT4 ACACB SLC38A3 GSR OAZ3 TCN1 SLC1A1 SMPD4 BHMT2 HSD17B4 GRIK5 GLDC PPIB PIPOX ADA SCN3B S100A1 PLG SLC1A4 CBS GLRB ACVR1B SLC6A2 PTEN
Example Result: PTEN associations in UCEC Genomic aberrations Candidate targets Candidate inhibitors PTEN PIK3R1/PIK3CA Wortmannin PIK3R1 gene expression PTEN mutation status PDB id 3hhm
Repurposing existing cancer drugs in other cancers Genomic aberrations Candidate targets Candidate inhibitors Existing cancer indication Target Cancer Drug A New cancer indication
Example Result • TP53 is frequently mutated in most tumor types • ABCG2, also known as Breast Cancer Resistance Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data • Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins • High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. “It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. “ [Shim et al. JNCI 2012]
Understanding behavior of massive multicellular systems: BioCellion Source: http://www.sjrcd.org/soilhealth/soilagg.html Source: http://www.theregister.co.uk source: EMBO Rep. 2004 May; 5(5): 470–476. Ductal Carcinoma model: Nicholas Flann, Utah State Univ. Source: http://www.webmd.com
Acknowledgments Brady Bernard, Ryan Bressler, Andrea Eakin, TimoErkkilä, Lisa Iype, Seunghwa Kang, Theo Knijnenburg, Roger Kramer, Richard Kreisberg, KalleLeinonen, Jake Lin, Yuexin Liu, Michael Miller, Sheila Reynolds, Hector Rovira, VesteinnThorsson, Da Yang, Wei Zhang