1 / 84

Cheminformatics Strategies for anticipating Good and Bad side effects:

Cheminformatics Strategies for anticipating Good and Bad side effects: Prediction of Multiple CYP Metabolic Sites and Off-target Interactions Curt M. Breneman*, Jed Zaretzki and Sourav Das Genentech, March 25, 2010. Presentation Outline.

guy-glover
Download Presentation

Cheminformatics Strategies for anticipating Good and Bad side effects:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cheminformatics Strategies for anticipating Good and Bad side effects: Prediction of Multiple CYP Metabolic Sites and Off-target Interactions Curt M. Breneman*, Jed Zaretzki and Sourav Das Genentech, March 25, 2010

  2. Presentation Outline • Part I. Metabolic regioselectivity models for nine CYP450 isozymes using RS_Predictor • Part II. Property-Encoded Shape Distributions (PESD) for Comparing Protein Binding Sites and Predicting Off-target Interactions

  3. Part I. Metabolic regioselectivity models for nine CYP450 isozymes using RS_Predictor

  4. Overview of Part I • Motivation • Identify the problem • Methods • Datasets • Results • Conclusions

  5. Motivation: Why is this important? • Cytochrome P450s account for approximately 90% of phase I metabolic reactions of all marketed drugs • Prediction of metabolic sites on lead candidates empowers medicinal chemists to: • modify labile sites of lead candidates in order to increase bioavailability without changing efficacy • perform pro-drug design • Identify and block potential metabolites with undesired PK behavior • Reliable in silico identification of metabolic liabilities early in the drug discovery process would allow early triage or modification of unsuitable lead compounds

  6. Motivation: What’s come before • Reactivity-Based Models – ligand only • QSAR-based regioselectivity models using a random forest algorithm (Sheridan et al., 2007) • AM1 Semi-empirical calculations (Singh et al., 2003) used to estimate the energy necessary to abstract a hydrogen atom from a substrate • Recognition-Based Models – ligand and enzymatic structure • MetaSite reactivity and recognition-based application (Cruciani et al., 2005) utilizing GRID molecular interaction fields (Goodford et al., 1985) • Docking algorithms, Dock (Ewing et al., 2001), Glide (Friesner et al., 2004), and GLUE (Zamora et al., 2006)

  7. Identifying the Problem: A racing metaphor FINISH Race 1 Race 4 Race 2 Race 3 OXIDIZE Example: Lidocaine

  8. New Methods • RS-Predictor - A specialized QSAR using Multiple-Instance Ranking (MIRank) and hierarchical electronic descriptors • SMARTCyp - A 2D method using DFT transition state calculations on molecular fragments to create energy rules representing site reactivity

  9. N H H H H H H H C H H H H H H H C H H H C H H H H C H H H C H C C C C C C N C Lidocaine Lidocaine Metabolophore 1 Metabolophore 2 Metabolophore 4 Metabolophore 6 Metabolophore 5 Metabolophore 3 Base Atom Descriptors Metabolophore 7 Metabolophore 8 Metabolophore 5 Atom Descriptors Green group designates the experimentally determined site of metabolism • QC Atom Based - 112 • AM1 charge • Hydrophobic moment • Fukui reactivity • QC Atom Pair Based - 280 • σ − σ bond order • Electronic resonance • Coulomb interaction • Topological Descriptors - 148 • Hydrogen bond count • Span • Ring information • Rotatable bonds • Physical environment • Distribution of atom types at 1, 2 , 3 • and 4 bonds away from base atom

  10. Group 1 Group 2 Group 1 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H7 H8 H H H H H H H H H H Group 4 Group 3 Group 2 Group 2 Group 3 Trend Identification using Multiple-Instance Ranking (MIRank) Molecule 2 Molecule 1 Group 1 Group 3 Group 4 Group 4 Molecule 3 Molecule 4 Group 3 Group 2 Group 5 Group 1 Group 6

  11. Group 1 Group 2 Group 1 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H7 H8 H H H H H H H H H H Group 4 Group 2 Group 3 Group 2 Group 3 Multiple Instance Ranking (Bergeron et al., IEEE PAMI) Molecule 2 Molecule 1 MIRank identifies descriptor- based trends present in each molecule Trends are then combined to produce a single global ranking model of metabolic regioselectivity Group 1 Group 3 Group 4 Group 4 Molecule 3 Molecule 4 Group 3 Group 2 Group 5 Group 1 Group 6

  12. SMARTCyp:AFragment Reactivity Approach (Rydberg et al., ACS Medicinal Chemistry Letters, 2010)

  13. SMARTCyp (Rydberg et al., ACS Medicinal Chemistry Letters, 2010)

  14. Datasets Prior to this work, few public datasets of P450 substrates with experimental responses existed - (Sheridan et al., 2007) • 3A4 - 324 compounds • 2D6 - 132 compounds • 2C9 - 101 compounds We have expanded these three datasets and created new datasets for nine isozymes:

  15. Common CYP P450-mediated pathways Desulfuration

  16. Observed and Potential SOMs of 459 3A4 substrates broken down by reaction pathway number of observed SOM that follow specified pathway number of observed SOM C-sp3 Hydroxylation Aromatic Ring Hydroxylation Non-Aromatic Hydroxylation = O-dealkylation N-dealkylation Sulfur(II) Oxidation Sulfur(IV) Oxidation Desulfuration C-sp2 Hydroxylation Aldehyde Oxidation Alcohol Oxidation Group C = Nitrogen based reactions (purple) Nitrogen Hydroxylation Group B = Csp2 Reactions (dark blue) N-oxide formation Group A = Sulfur based reactions (light blue) Nitro-group Reduction Dehalogenation number of potential SOM capable of specified pathway number of potential SOM Other

  17. Pathway preferences (major column) by Isozyme (minor column) 2E1 1A2 2C19 2C19 2B6 2C9 3A4 2D6 2C8 2A6 3A4 2E1 2C9 2B6 2A6 2C8 Group B (Csp2 reactions) Group C (Nitrogens) Other Group A (Sulfurs) Oxygen Dealkylation Non-Aromatic Hydroxylation Nitrogen Dealkylation Aromatic Ring Hydroxylation C-sp3 Hydroxylation

  18. Pathway propensity of 3A4 metabolism compared to pathway prediction quality of RS-Predictor number of observed SOM that follow specified pathway number of observed SOM % Observed SOM predicted 1st % Non-Observed SOM predicted 1st % Observed SOM predicted 1st or 2nd % Non-Observed SOM predicted 1st or 2nd number of potential SOM capable of specified pathway number of potential SOM % Non-Observed SOM predicted 1st, 2nd or 3rd % Observed SOM predicted 1st, 2nd or 3rd » » » % of 459 molecules correctly predicted C-sp3 Hydroxylation Aromatic Ring Hydroxylation Non-Aromatic Hydroxylation Oxygen Dealkyl ation Nitrogen Dealkylation Group A (Sulfurs) Group B (Csp2 reactions) Group C (Nitrogens) Other Overall

  19. 3A4 - Results (459 Compounds) C-sp3 Hydroxylation Aromatic Ring Hydroxylation Non-Aromatic Hydroxylation Oxygen Dealkylation Nitrogen Dealkylation Group A (Sulfurs) Group C (Nitrogens) Group B (Csp2 reactions) Other Overall

  20. 3A4 - Results (394 Compounds) Overall Group B (Csp2 reactions) Group C (Nitorgens) Other C-sp3 Hydroxylation Aromatic Ring Hydroxylation Non-Aromatic Hydroxylation Oxygen Dealkylation Nitrogen Dealkylation Group A (Sulfurs)

  21. Overall Results

  22. RS-Predictor outperforms SMARTCyp by > 5% 2D6 and 2C19 are involved in psychotropic drug metabolism

  23. SMARTCyp outperforms RS-Predictor Current hypothesis is that SMARTCyp performs best on small molecules. 2E1 database contains a significant number of small compounds

  24. Additional Results: Private data • Blind predictions were made on a set of 20 proprietary compounds provided by a partnering pharmaceutical company • Predictions were made using models developed from the public literature • RS-Predictor found the experimental sites of metabolism within the top two rankings in 85% of the blind test compounds, and identified the correct region of metabolism in 100% of the cases

  25. Take home message • RS_Predictor utilizes customized descriptors and exploits a novel machine learning framework to address a difficult problem with limited experimental data • We have compiled and will release an extensive set of curated public P450 metabolic site data across nine isozymes, facilitating future research and applications

  26. Part II. Property-Encoded Shape Distributions (PESD) for Comparing Protein Binding Sites and predicting Off-target Interactions

  27. Motivation • Discovery of low molecular weight somatostatin receptor subtype 5 (hSST5R) antagonists • Astemizole as the lead structure • Astemizole’s original target was H1, a histamine receptor • H1 has binding site amino-acid composition similar to hSST5R’s Martin, R. E.; Green, L. G.; Guba, W.; Kratochwil, N.; Christ, A. J. Med. Chem.2007, 50, 6291-6294.

  28. Motivation • Repositioning entacapone : discovery of safe chemical compounds with the potential to treat MDR-TB and XDR-TB • original target human catechol-O-methyltransferase (COMT) • Binding site similar: COMT and M. tuberculosis enoyl-acyl carrier protein reductase (InhA) • Ligand docked and experimentally validated: activity MIC99 = 260 µM Kinnings, S. L.; Liu, N.; Buchmeier, N.; Tonge, P. J.; Xie, L.; Bourne, P. E. PLoS Comput. Biol.2009, 5, e1000423.

  29. Motivation • Side-effects: good and bad • Gleevec and Sutent acting on several targets • Permax and Dostinex activating 5-HT2B serotonin receptors in addition to dopamine receptors, causing valvular heart disease Frantz, S. Nature2005, 437, 942-943. Keiser et al. Nature2009, 462, 175-181.

  30. Binding Site Representation 1btn 1b55 (IP binding) (IP binding) Low sequence conservation

  31. Binding Site Representation 1btn 1b55 (IP binding) (IP binding) The EP mapped surfaces are similar

  32. Binding Site Representation • Molecular surfaces: sets of adjacent triangles mapped with property values at each vertex • A MOE binding site surface typically has 8000 to 12000 triangles • Rigorous comparison involves matching each triangle: use of a clique detection algorithm that is computationally expensive (NP-hard) • Slow for high-throughput similarity detection and global similarity search Kinoshita, K.; Nakamura, H. Protein Sci.2003, 12, 1589-1595.

  33. Property-Encoded Shape Distributions ActiveLP EP H1 H2 • Conversion of property distribution on surfaces to a string of numbers or signatures H1, H2, etc. • Similarity between two binding sites is simply similarity between two signatures Das, S.; Kokardekar, A.; Breneman, C. M. J. Chem. Inf. Model.2009, 49, 2863-2872.

  34. Property-Encoded Shape Distributions • Large number of randomly selected pairs of points from the surface for convergence and binned by distance & property combinations M1 d1 M2 Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape Distributions. ACM Trans. Graph. 2002, 21, 807-832

  35. Property-Encoded Shape Distributions • Large number of randomly selected pairs of points from the surface for convergence and binned by distance & property combinations M3 d2 M4 Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape Distributions. ACM Trans. Graph. 2002, 21, 807-832

  36. Property-Encoded Shape Distributions • Large number of randomly selected pairs of points from the surface for convergence and binned by distance & property combinations Mp d Mq d M1M2, M3M4, etc

  37. Property-Encoded Shape Distributions • Recall MOE surfaces are triangulated and color coded (representative of property and its magnitude) at each vertex • To choose a surface point in an unbiased way: • The selected point is assigned the property magnitude of its nearest vertex • Store triangles as array of cumulative areas • Randomly choose a value x between 0 and total area • For the triangle in the array having x within bounds of its cumulative area, calculate a point, such that:

  38. Property-Encoded Shape Distributions • Signature comparison by chi-squared distance • Final distance score weighted sum of EP and ActiveLP distance • 10 to 15 seconds for query signature computation, >100,000 sites can be screened in ~5 minutes ; Weight

  39. How to determine the optimum weights? • Binding sites from 40 different proteins bound to 4 different types of ligands – ATP, NADP, steroid, heme (Morris et al.Bioinformatics2005, 21, 2347-2355) • Weight parameter for ActiveLP distance systematically varied by 0.1 unit • L1 and L2 metrics also tested for distance L1: r=1 L2: r=2

  40. How to determine the optimum weights? Best Clustering of Binding Sites obtained Higher Accuracy than PocketMatch (Yeturu et al. BMC Bioinformatics, 2008, 9, 543-559)

  41. Virtual Screening with PESD Binding Site Query Database Sorted List of matches Binding Sites Screened on the PDBbind data set and the FINDSITE data set Wang et al.J. Med. Chem. 2004, 47, 2977-2980. Brylinski, M.; Skolnick, J. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 129-134.

  42. Overall accuracy: screening the PDBbind set Ability of PESD to return a binding site with the same E.C. numbers in the top ranks of matched sites.

  43. Case Studies 1b55 1btn (IP binding) (IP binding) Low similarity of amino-acids

  44. Case Studies Fructose-1,6-bisphosphatase Flavoprotein Low similarity of amino-acids

  45. Case Studies 2jav 1cdk (cAMP dependent Kinase) (Nek2 Kinase) SU11248 (Sunitinib, Sutent) ROCS ligand similarity 0.488 : Shape 0.660 : Combo Cl ANP SU11652 Ligands dissimilar

  46. Overlap of inhibitor binding site with ATP binding site in Nek2 SU11652 ( Nek2 - 2jav)

  47. Overlap of inhibitor binding site with ATP binding site in Nek2 SU11652 ( Nek2 - 2jav) ANP (1cdk)

  48. Overlap of inhibitor binding site with ATP binding site in Nek2 SU11652 ( Nek2 - 2jav) ATG (Nek2 - 2w5b) ANP (1cdk)

  49. Case Studies 1aer 1isi BST-1 Pseudomonas Aeruginosa exotoxin Ligand sub-structural similarity from binding site similarity

  50. Case Studies 1bq4 2fvv Human diphosphoinositol polyphosphate phosphohydrolase 1 Phosphoglycerate mutase Ligand sub-structural similarity from binding site similarity

More Related