700 likes | 796 Views
Toxicological Relationships Between Proteins Obtained From a Molecular Spam Filter. Florian Nigsch & John Mitchell. F. Nigsch, et al ., J. Chem. Inf. Model., 48 , 306-318 (2008) F. Nigsch, et al ., Toxicology and Applied Pharmacology , 231 , 225-234 (2008)
E N D
Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell F. Nigsch, et al., J. Chem. Inf. Model.,48, 306-318 (2008) F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008) F. Nigsch, et al., J. Chem. Inf. Model.,48, 2313-2325 (2008)
Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell F. Nigsch, et al., J. Chem. Inf. Model.,48, 306-318 (2008) F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008) F. Nigsch, et al., J. Chem. Inf. Model.,48, 2313-2325 (2008)
Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell Now at Novartis Institutes, Boston
Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell Soon moving to University of St Andrews
Spam • Unsolicited (commercial) email • Approx. 90% of all email traffic is spam • Where are the legitimate messages? • Filtering
Analogy to Drug Discovery • Huge number of possible candidates • Virtual screening to help in selection process
High affinity to protein target Soluble Permeable Absorbable High bioavailability Specific rate of metabolism Renal/hepatic clearance? Volume of distribution? Low toxicity Plasma protein binding? Blood-Brain-Barrier penetration? Dosage (once/twice daily?) Synthetic accessibility Formulation (important in development) Properties of Drugs
Multiobjective Optimisation Synthetic accessibility Bioactivity Solubility Toxicity Permeability Metabolism Huge number of candidates …
Multiobjective Optimisation Synthetic accessibility Bioactivity Drug Solubility Toxicity U S E L E S S Permeability Metabolism Huge number of candidates … most of which are useless!
Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm
Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm
Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm
Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm
Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm
Feature Space - Chemical Space m = (f1,f2,…,fn) f3 f3 f2 COX2 CDK2 f1 Feature spaces of high dimensionality CDK1 f2 DHFR f1
Combinations of Features Combinations of molecular features to account for synergies.
Features of Molecules Based on circular fingerprints
Workflow For predicting protein targets
Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology
Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology
Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology
Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology
Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology
Predicted Protein Targets • Selection of 233 classes from the MDL Drug Data Report • ~90,000 molecules • 15 independent 50%/50% splits into training/test set
Predicted Protein Targets Cumulative probability of correct prediction within the three top-ranking predictions: 82.1% (±0.5%)
Model for target prediction Annotated library of toxic molecules MDL Toxicity database ~150,000 molecules Standardisation MySQL database For each molecule we predict the likely target Correlations between predicted protein targets and known toxicity codes Canonical (23) Full (490) Computational Toxicology
Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins) • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.
Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.
Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.
Predictions Obtained Highest ranking class IS predicted protein target Protein code j Target Prediction L70 - Changes in liver weight<Liver Y07 - Hepatic microsomal oxidase<Enzyme inhibition M30 - Other changes<Kidney, Urether, and Bladder L30 - Other changes<Liver Toxicity codesi Result matrix R = (rij) rij incremented for each prediction. Protein targets Toxcodes ( ) … r11 r12 r21
Toxicity Annotations FULL TOXICITY CODES (490) Y41 : Glycolytic < Metabolism (intermediary) < Biochemical CANONICAL TOXICITY CODES (23)
Cardiac - G Kainic acid receptor Adrenergic alpha2 Phosphodiesterase III cAMP Phosphodiesterase O6-Alkylguanine-DNA alkyltransferase Vascular - H Angiotensin II AT2 Dopamine (D2) Bombesin Adrenergic alpha2 5-HT antagonist Proteins by Toxicity
Top 5 Proteins by Toxicity 68 distinct proteins for 23 toxicity classes, i.e., 3.0 proteins per canonical toxicity code. Lanosterol 14alpha-Methyl Demethylase 5 Glucose-6-phosphate Translocase 4 IL-6 4 Benzodiazepine Antagonist 3 Kainic Acid Receptor 3 Proteins and their connectivities
Clustering of Toxicity Classes Clustering of toxicity classes: based on predicted protein associations from the result matrix
Correlation Between Toxicity Classes Correlations between toxicity classes: 23 by 23 correlation matrix
Correlation Between Proteins Correlations between proteins:233 by 233 correlation matrix
Correlation Between Proteins Correlations between proteins: 233 by 233 correlation matrix Cluster 1 (proteins 6-11)
We will look at two specific clusters, which are called Cluster 1 and Cluster 4.
Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 • Cluster 1 (proteins 6-11) • Within-cluster correlation (without auto-correlation) r = 0.95
Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 • Cluster 1 (proteins 6-11) • Within-cluster correlation (without auto-correlation) r = 0.95
Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 Cluster 1 • Within-cluster correlation (without auto-correlation) r = 0.95 Proteins involved in breast cancer
Cluster 1 Proteins involved in breast cancer