1 / 53

Chemoinformatics approaches to virtual screening and in silico design

Chemoinformatics approaches to virtual screening and in silico design. Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg http://infochim.u-strasbg.fr/. Strasbourg. Paris. Laboratory of Chemoinformatics. Master on Chemoinformatics (since 2002). Chemoinformatics:

Download Presentation

Chemoinformatics approaches to virtual screening and in silico design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg http://infochim.u-strasbg.fr/

  2. Strasbourg Paris

  3. Laboratory of Chemoinformatics Master on Chemoinformatics (since 2002)

  4. Chemoinformatics: new disciline combining several „old“ fields Chemical databases, QSAR, Virtual screening, In silico design , ……………..

  5. OUTLOOK • Needs for chemoinformatics • Fundamentals of chemoinformatics • Some applications

  6. Chemoinformatics: why

  7. amount of information • many millions of compounds and reactions • many millions of publications Storage, organization and search experimental data Chemical Databases

  8. September 2010 May 2009 54,984,228 +7 M 62,105,511 +2 M +22 M 39,804,330 281,474 43,995,234 831,886

  9. Problem: Flood of Information • > 54 million compounds • > 5 million new compounds / year • 800,000 publications / year => can anyone read 4.000 publications / day ? chemical information should be well organized and searchable

  10. Problem: Not Enough Information • > 54,000,000 chemical compounds • > 500,000 3D structures in Cambridge Crystallographic File • 230,000 infrared spectra in largest database (Bio-Rad) > 1 % of all compounds 0.4 % of all compounds What about physico-chemical and biological properties ? The goal of chemoinfomatics is to develop predictive approaches and tools

  11. Chemoinformatics as a modeling discipline

  12. Chemoinfomatics as a modeling discipline What structure do I need for a certain property ? How do I make this structure ? What is the product of my reaction ? structure-activity relationships synthesis design reaction prediction, structure elucidation

  13. Theoretical chemistry Quantum Chemistry - Molecular model - Basic concepts - Major applications - Learning approaches Force Field Molecular Modelling Chemoinformatics

  14. electrons and nuclei atoms and bonds • molecular graph • descriptor vector Molecular Model Quantum Chemistry Force Field Molecular Modelling Chemoinformatics

  15. Schrödinger equation, HF, DFT, … Classical mechanics Statistical mechanics • Graph theory, • Statistical Learning Theory Basic mathematical approaches Quantum Chemistry Force Field Molecular Modelling Chemoinformatics

  16. Basic concepts wave/particle dualism Quantum Chemistry Force Field Molecular Modelling classical mechanics chemical space Chemoinformatics

  17. Chemical space = objects + metrics • Objects: - molecular graphs; - descriptors vectors {Di} = f ( ) Metrics: - Graphs hierarchy, - Similarity measures

  18. Navigation in Chemical Space: topological space of chemical structures • Relationships between the objects: • Hierarchical scaffold-tree approach • Structural mutation rules • Network-like Similarity Graphs • Combinatorial Analog Graphs • …………. • Rational organisation of structural data • Exploration of the chemical space • Identification of new objects (e.g., active scaffolds, R-groups combinations, etc)

  19. Navigation in Chemical Space: vectorial space defined by molecular descriptors Relationships between the objects: In this space, each molecule is represented as a vector whereas the metric is defined by similarity measures. • In properly selected spaces, neighboring molecules possess similar properties. • Different databases could be compared. • Compounds subsets for screening could be rationally selected

  20. Example :Hansch Analysis Biological Activity = f (Physicochemical parameters ) + constant log1/C = a ( log P )2 + b log P +  + Es + C • Physicochemical parameters can be broadly classiied into three general types: • Electronic (s) • Steric (dEs) • Hydrophobic (logP)

  21. Molecular Descriptors Constitutional (mol. weight, the number of S, N or O atoms, …) Topological (Randic index, informational content, …) Geometrical (molecular size, distances between functional groups, … ) Electrostatic (electrostatic potential, charges, …) Charged Partial Surface Area Quantum-chemical (energies of molecular orbitals, reactivity indices, …) Thermodynamical (heat of formation, logP, …) Fragments (sequences of atoms and bonds, augmented atoms, …) More than 4000 types of descriptors are known

  22. deductive >> inductive deductive  inductive deductive << inductive Learning approach Quantum Chemistry Force Field Molecular Modelling Chemoinformatics

  23. Learning approach • In chemoinformatics the logic of learning is not based on existing physical theories. Chemoinformatics considers the world too complex to be a priori described by any set of rules. Thus, the rules (models) in chemoinformatics are not explicitly taken from rigorous physical models, but learned inductively from the data.

  24. Chemoinformatics: From Data to Knowledge deductive learning inductive learning generalization know- ledge context information measurement or calculation data

  25. Models • In chemoinformatics, a model represents an ensemble of rules or mathematical equation linking a given property (activity) with the molecular structure. PROPERTY= f (structure) • Two main types of models: • - binary classification (SAR) • - regression (QSAR)

  26. Organic chemistry: exercise of « intuitive » chemoinformatics

  27. Extraction of rules from the data The Markovnikov Rule:  When a Brønsted acid, HX, adds to an unsymmetrically substituted double bond, the acidic hydrogen of the acid bonds to that carbon of the double bond that has the greater number of hydrogen atoms already attached to it.

  28. Major applications • Algorithms for organisation and search the data • fingerprints, • graph theory, • similarity measures, • Machine-learning approaches: • MLR, • Decision Trees, • Artificial Neural Networks, • Support Vector Machines, • ……… Chemical Databases Structure-Activity Models Virtuel screening In silico design

  29. Chemoinformatics: some applications

  30. Dmitry Mendeleév(1834 – 1907) Discoverer of the Periodic Table — an early “Chemoinformatician” • Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in 1869. Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements:Ga (1875), Sc (1879) and Ge (1886).

  31. Periodic Table Chemical properties of elements gradually vary along the two axis

  32. computations Virtual Screening Small Library of selected hits Hit Target Protein High Throughout Screening Large libraries of molecules experiment

  33. Virtual screening is inevitable to analyse a huge amount of protein-ligand combinations • Human proteome: • 84000 peptides • Chemical universe: • > 50 M compounds are currently available • 1060 druglike molecules could be synthesised Virtual screening must be very fast and efficient !

  34. Virtual screening “funnel” Filters Similaritysearch Pharmacophore models CHEMICAL DATABASE (Q)SAR Docking VIRTUAL SCREENING HITS ~106 – 109 molecules ~101– 103 molecules INACTIVES

  35. REACh regulation • The European Union adopted Regulation on the Registration, Evaluation, Authorisation, and Restriction of Chemicals (the “REACH Regulation”), which entered into force on June 1, 2007. • REACH imposes requirements of information of physico-chemical, toxicology and eco-toxicology parameters for the chemicals, production of which exceeds 1 ton. • More than 30.000 compounds must be tested. Total cost estimated (EU Commission) over a 11 -15 year period is €2.8 - €5.2 bn No Data, No Market!

  36. Chemoinformatics tools in SciFinder: predictions of > 20 physico-chemical properties and NMR spectra for each individual compound

  37. Drug design

  38. Virtual screening: success stories & drugs Virtual screening - what does it give us? Herbert Koppen (Boehringer, Germany) Current Opinion Drug Discovery & Dev. (2009) 12: 397-407 From virtuality to reality Ulrich Rester (Bayer, Germany) Current Opinion Drug Discovery & Dev. (2008) 11: 559-568 What has virtual screening ever done for drug discovery? David E Clark (Argenta Discovery Ltd, UK) Expert Opinion on Drug Discovery (2008) 8: 841-851

  39. In silico screening: success stories & drugs Market: tirofiban (1999) Aggrastat (trade name) from Merck, GP IIb/IIIa antagonist (myocardial infarction, it is an anticoagulant)) (2S)-2-(butylsulfonylamino)-3-[4-[4-(4-piperidyl)butoxy]phenyl propanoic acid (Mol. Mass: 440.6 g/mol) PK data: Bioavailability: IV only (intravenous only); Half life : 2 hours Combined with heparin and aspirin, but numerous precautions http://www.bioscience.ws/encyclopedia/

  40. Materials design

  41. Ionic Liquids Ionic Liquids are composed of large organic cations: and anions: PF6-, Cl-, BF4-, CF3SO3-, [CF3SO2)2N]-

  42. Ionic Liquids Large organic cations: anions: PF6-, Cl-, BF4-, CF3SO3-, [CF3SO2)2N]- There existcombinations of ions that could lead to useful ionic liquids 1018

  43. Viscosity predictions on 23 new ILs Solvionics company None of these Ionic Liquids have been used for model preparation

  44. Ionic Liquids viscosity: Experimental validation of the Neural Networks models pred • prediction error (~70 cP) is similar to the “noise” in the experimental data used for the training of the model RMSE=73 cP exp G. Marcou, I. Billard , A. Ouadi and A. Varnek, submitted

  45. Metabolites prediction

  46. Prediction of aromatic hydroxylation sites for human CYP1A2 substrates aromatic hydroxylation CYP1A2 Potential hydroxylation sites Method: SVM + descriptors issued from condensed graphs of reaction The obtained model correctly predicts the hydroxylation products with the probability of ≈80% (see poster of C. Muller) ? ? ? ?

  47. Reaction conditions

  48. Search of optimal reaction conditions + H2 reaction query A B C Potential products of the reaction. The compound Ais a target

  49. Experimental validation + H2 A Sub A. Varnek, in “Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010

More Related