Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Advanced Bioinformatics Lecture 7: Computer-aided lead identification ZHU FENG zhufeng@cqu.edu.cn http://idrb.cqu.edu.cn/ Innovative Drug Research Centre in CQU 创新药物研究与生物信息学实验室

Table of Content Schematic of DOCKing Pharmacophore-based docking INVDOCK Strategy Ligand-based drug design Classification of drugs by SVM 2

T Complex Receptor Ligand What is docking? Given two molecules find their correct association + = Computationally predict the structures of protein-ligand complexes from their conformations and orientations. The orientation that maximizes the interaction reveals the most accurate structure of the complex. 3

General protein–ligand binding • Ligand • Molecule that binds with a protein • Protein active site(s) • Allosteric binding • Competitive binding • Function of binding interaction • Natural and artificial 4

Docking strategy PDB file Surface Representation Patch Detection Matching Patches Scoring & Filtering Candidate complexes 5

Schematic of docking methodology • the target binding site is filled with site points • distances between atoms in a molecule are matched to that of site points • a transformation matrix is calculated for an orientation • the molecule is docked into the binding site, and the fit of that conformer is scored 6

Design of HIV-1 protease inhibitor Step 1: creation of spheres to fit a cavity 7

Design of HIV-1 protease inhibitor Step 2: place a ligand to match the position of spheres 8

Design of HIV-1 protease inhibitor Step 3: check chemical complementarity 9

Scoring in ligand-protein docking Potential energy description 10

Some techniques • Surface representation, that efficiently represents the docking surface and identifies the regions of interest • Connolly surface • Lenhoff technique etc. Dense MS surface (Connolly) Sparse surface (Shuo Lin et al.) 11

Connolly surface • Each atomic sphere is given the van der Waals radius of the atom • Rolling a Probe Sphere over the Van der Waals surface leads to the Solvent Reentrant Surface or Connolly surface 12

Lenhoff technique • Computes a “complementary” surface for the receptor instead of the Connolly surface, i.e. computes possible positions for the atom centers of the ligand Atom centers of the ligand van der Waals surface 13

Pharmacophore-based docking Basic idea • Appropriate spatial disposition of a small number of functional groups in a molecule is sufficient for achieving a desired biological effect. • The ensemble formation will be guided by these functional groups 14

6.7 4.2-4.7 4.8 5.2 5.1-7.1 3-D representation of a protein binding site Distances between binding groups in Angstroms and the type of interaction is searchable 15

Pharmacophore Fingerprint • Appropriate spatial disposition of a small number of functional groups in a molecule is sufficient for achieving a desired biological effect. • The ensemble formation will be guided by these functional groups 16

Schematic of PhDOCK methodology DOCK PhDOCK 17

Advantages and disadvantages of PhDOCK • Advantages: speed increase due to (1) rapid elimination of ligands containing functional groups which would interfere with binding. (2) speed increase over docking of individual molecules. (3) more information pertaining to the entire molecule is retained (no rigid portions). (4) Chemical matching and critical clusters are encouraged. • Disadvantages: (1) complex queries are extremely slow. (2) the majority of the information contained in the target structure is not considered during the search. 18

INVDOCK Strategy Existing methods Given a protein, find putative binding ligands from chemical database Given Lock, find Key Forward lead identification Science 1992; 257:1078 INVDOCK methods Given a ligand, find putative protein targets from protein database Given Key, find Lock Backward MOA prediction Proteins 1999; 36:1 19

INVDOCK Test on Drug Target Prediction Anticancer Drug Tamoxifen PDB Id Protein Experimental Findings 1a25 Protein Kinase CSecondary Target 1a52 Estrogen Receptor Drug Target 1bhs 17 beta HSD dehydragenase Inhibitor 1bld bFGF Factor Inhibitor 1cpt Cytochrome P450-TERP Metabolism 1dmo Calmodulin Secondary Target Proteins. 1999; 36:1 Tamoxifen is a famous anticancer drug for treatment ofbreast cancer. It was approved by FDA in1998 as the 1st cancer preventivedrug. 30 million people are expected to use it. 20

INVDOCK Test on Drug Target Prediction Drug Toxicity Targets (J. Mol. Graph. Mod. 2001, 20, 199) 21

Results of docking studies The docked (blue) and crystal (yellow) structure of ligands in some PDB ligand-protein complexes. The PDB Id of each structure is shown. 22

Dataset and Testing Results Protein-Proteincases from protein-protein docking benchmark: Enzyme-inhibitor – 22 cases Antibody-antigen – 16 cases Protein-DNAdocking: 2 unbound-bound cases Protein-drugdocking: tens of bound cases (Estrogen receptor, HIV protease, COX) Performance: Several minutes for large protein molecules and seconds for small drug molecules on standard PC computer. Estrogen receptor Estradiol molecule from complex Docking solution DNA Endonuclease Docking solution Estrogen receptor with estradiol (1A52). RMSD 0.9Å, rank 1, running time: 11 seconds Endonuclease I-PpoI (1EVX) with DNA (1A73). RMSD 0.87Å, rank 2 23

Classification of Drugs by SVM A drug is classified as either belong (+) or not belong (-) to a class Drug class: inhibitor of a protein, BBB penetrating, genotoxic, etc. Protein class: enzyme EC3.4 family, DNA-binding, etc. By screening against all classes, the property of a drug or the function of a protein can be identified Class-1 SVM - Drug belongs to class-2 Class-2 SVM + Drug …… - Class-n SVM - 24

Classification of drugs by SVM What is SVM? • Support vector machines, a machine learning method based on artificial intelligence, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm). 25

Artificial Intelligence (AI) 26

Machine learning method Inductive learning (example-based learning) 27

Machine learning method Feature vectors A = (1, 1, 1) B = (0, 1, 1) C = (1, 1, 1) D = (0, 1, 1) E = (0, 0, 0) F = (1, 0, 1) 28

Z Input space F B A E Y X Machine learning method Feature vectors in input space Feature vector A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) 29

Drug family members Border New border Drug family members Nonmembers Nonmembers Project to a higher dimensional space SVM Method 30

New border Support vector Support vector Protein family members Nonmembers SVM Method 31

Support vector Protein family members Nonmembers New border Support vector SVM Method 32

Best Linear Separator? 33

Find closest points in convex hulls d c 34

Plane bisect closest points d c 35

Best Linear SeparatorSupporting plane method Maximize distance Between two parallel supporting planes Distance = “Margin” = 36

Best Linear SeparatorSupporting plane method 37

SVM Method Border line is nonlinear 38

Non-linear transformation: use of kernel function SVM Method 39

SVM Method 40

SVM Method 41

SVM Method 42

SVM Method 43

SVM Method 44

SVM for classification of drugs How to represent a drug? • Each structure represented by specific feature vector assembled from structural, physico-chemical properties • Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) • Molecular Connectivity and shape (28 in total) • Electro-topological state polarity (84 in total) • Quantum chemical properties (electric charge, polaritability etc. 13 in total) • Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004) 45

SVM-based drug design and property prediction software Your drug structure Chemical Structure Chemical Structure Drug Option two Which class your drug belongs to? Option one Send structure to classifier Input structure through internet SVM classifier for every Drug class Computer loaded with SVMProt Drug designed or property predicted Identified classes Input structure on local machine 46

SVM drug prediction results • Protein inhibitor/activator/substrate prediction • 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted. • 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted • Drug toxicity prediction • 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted • 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted • Pharmacokinetics prediction • 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted • 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. 47

Any questions? Thank you! 48

Advanced Bioinformatics Lecture 7: Computer-aided lead identification