210 likes | 351 Views
Virtual Screening at the post-genomic era Dr. Didier ROGNAN Bioinformatic Group UMR CNRS 7081 Illkirch, France didier.rognan@pharma.u-strasbg.fr. Virtual screening: Definition. Searching electronic databases (2D, 3D) for molecules fitting: a pharmacophore an active site.
E N D
Virtual Screening at the post-genomic era Dr. Didier ROGNAN Bioinformatic Group UMR CNRS 7081 Illkirch, France didier.rognan@pharma.u-strasbg.fr
Virtual screening: Definition • Searching electronic databases (2D, 3D) for molecules fitting: • a pharmacophore • an active site Walters et al. Drug Discovery Today1998, 3, 160-178 Schneider et al., Drug Discovery Today2002, 7, 64-70.
Importance of virtual screening • SciScientific reasonsntific reasons • Increasing number of interesting macromolecular targets (500 10,000) • Increasing number of protein 3-D structures (X-ray, NMR) • Better knowledge of protein-ligand interactions • Dévelopement of chem- and bio-informatic methods • Increasing computing facilities • Economic reasons • High cost of high-througput screening (HTS): 0.2 € /molecule • Increase the ratio • ions • Applications • Identifying the very first ligands of orphan targets • Identifying/optimizing new chemical scaffolds # of active molecules (hits) # of tested molecules
Database (3-D) Target (3D !!) 1. Orientation « docking » Mol # DGbind 11121 -44.51 222 -42.21 3563 -41.50 6578 -40.31 25639 -40.28 . . . . . . 100000 22.54 2. Evaluation « Scoring » Target-Ligand Complex Hit list Protein-based virtual screening
Docking • Goal • Quickly find (1-2 min./molécule) • the orientation of the ligand in the active site • the protein-bound conformation • Méthods • Orientation • Surface complementarity • Complementarity of intermolecular interactions • Conformational freedom • Incremental construction • Conformational sampling (MC, GA, SA) Abagyan et al. Curr. Opin. Struct. Biol.2001, 5, 375-382
Docking :Orientation Surface-based orientation (e.g. DOCK) 2. Molecular surface (active site) 1. 3D structure 4. Matching sphere centers with atoms 3. Filling the surface by overlapping spheres
Docking :Orientation • interactions-based orientation (e.g. FlexX) • Statistical rules for locating ligand atoms • Overall placement of a base fragment by triangulation http://cartan.gmd.de/flexx
Docking: Ligand flexibility - by preselecting several conformers/molecules - by incremental construction Fragment decomposition base fragment Ligand Reading preferred torsion values Selecting the « best » Termination adding the 2nd adding the 1st peripheral fragment peripheral fragment
Docking: Ligand flexibility - by a genetic algorithm (e.g. Gold) Chromosome = Ligand (orientation, conformation) Initial population Selection of parents Genetic operators Selection of children New population Convergence test size # of evolutions Parent Score A 2.5 B 5.0 C 1.5 D 1.0 D C A B Survival rate New generation gene: x,y,z coords. tors. angles orientation … 100110010 010010011 100110011 010011010 100110010 100101010 crossing over crossing over rate mutation mutation rate http://www.ccdc.cam.ac.uk/prods/gold/
Docking Accuracy Analysing 100 high-resolution PDB complexes Paul,N. and Rognan, D. Proteins, in press Finding a reliable pose out of a set of 30-50 solutions is feasible !
Docking Accuracy Paul,N. and Rognan, D. Proteins, in press Analysing 100 high-resolution PDB complexes Ranking the most reliable solution at the top of the list is still an issue !
Source of Docking Errors Nature of the active site (flat vs. cavity) Missed influence of water Ligand flexibility Inaccuracy of the scoring function Unusual binding mode/interactions Inadequate set of protein coordinates Wrong atom typing Impossible Difficult Easy
Scoring 10 Thermodynamic Methods: FEP, TI (2) Force-fields (10-100) QSAR, 3D-QSAR(100-1,000) Empiricalscoring functions (>100,000) Accuracy Error, kJ/mol 2 2 1000 100,000 # of molecules
Scoring • First-principle methods: • sum of physically meaningfull terms • Regression-based free energy approximations: • sum of regression-weighted terms • Potential of mean forces • distance-dependent atom pair-weighted Helmotz free energies Gohlke et al. Curr. Opin. Struct. Biol. 2001, 11,231-235
D £ if r 0.25Å ì 1 ï - D < D £ 1 ( r - 0.25)/0.4 if 0.25 Å r 0.65 Å í ï D > 0 if r 0.65 Å î D £ ì 1 if α 3 0 º ï Da - D < D £ g ( ) = 1 α - 30)/50 if 30º α 80º ( í 2 ï D > 0 if α 80 º î D g ( r) = 1 £ ì if r R1 1 ï - < £ f(r) = 1 ( r - R1)/3. if R1 r R2 í ï > if r R2 0 î Empirical Scoring function Constant desolvation term buried-polar repulsive term H-bond term rotational term lipophilic term Fresno Rognan et al. (1999) J. Med. Chem., 42, 4650-4658.
Scoring Accuracy • Current accuracy: 5-10 kJ/mol (1-2 pK unit) • Weak point of all docking programs • Entropic contributions are difficult to handle ! ! • Way-around: use of consensus scoring functions
C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C (:C:C:@9)Br)C[16]:C:C:C:C:C@16)N C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C (:C:C:@9)Br)C[16]:C:C:C:C:C@16)N Full database 2-D Fingerprint Isis/Base • Filtering • Chemical reactivty • pharmacokinétics • Drug-likeness 2D 3D Hydrogens Ionisation 3-D Database Filtered database Library set-up
Applications High-resolution X-ray structures (enzymes) Target Ligands Base Hit Reference Rate CD4-gp120 inhibitors 150,000 9.7 %Li et al., PNAS (1997) gp41 inhibitors 20,000 12.5 %Debnath et al., J. Med. Chem. (1999) FT inhibitors 219,000 19.0 %Perola et al., J. Med. Chem (2000) kinesin inhibitors 20,000 12.5 %Hopkins et al., Biochemistry (2000) HIV1 Tar-Tat inhibitors 153,000 25.0 %Filikov et al., JCAMD (2000) gp41 inhibitors 20,000 12.5 %Debnath et al., J. Med. Chem Bcl-2 inhibitors 207,000 20.0 %Enyedi et al., J. Med. Chem (2001) HCA-II inhibitors 90,000 61.0 %Grüneberg et al., Angew. (2001) RAR agonists 250,000 6.6 %Shapira et al., BMC Struct. Biol. (2001) TPI inhibiteurs 108,000 20.0 %Joubert et al., Proteins (2001) ERa antagonists 1,500,000 72.0 %Shapira et al. IBM Sys. J. (2001) FT: farnesyltransférase, HCA: human carbonic anhydrase, RAR: retonic acid receptor, ER: Estrogen receptor, TPI: triosephosphate isomerase, PEP: phosphoenolpyruvate
Conclusions • What is possible ? • Discriminate true hits from random ligands • Enriching a reduced library by a factor 20 • Retrieving about 50% of all true hits • Prioritizing ligands for synthesis and experimental screening • Using virtual screening for lead finding • What remains to improve ? • Predicting the exact orientation • Predicting the absolute binding free energy • Discriminating true hits from “similar inactives“ • Catching all hits • Using virtual screening for lead optimization • Throughput (100K mols/day 1M/day ?) • Pre and post-processing of vHTS
Virtual screening at the genomic scale Primary Sequence 3-D Model virtual Hits True Hits RCPGs of the human genome GPCR-Gen e-Libraries “Bioinfo” (350,000) “RCPG” ( 30,000) “Endo” ( 2,000) vHTS Validation Optimisation Sélectivity Affinity ADME/Tox vs. Enzymes (PDB library) vs. RCPGs (RCPG library) Available analogues Focussed Libraries
Virtual screening: Tomorrow 1012 moleculesvirtualLibrary 109 107 107 (108 conformations) 105 (106 conformations) 104 103 100 ADME/Tox Similarité 2-D Conformations 3-D Similarity 3-D Docking Scoring expt.Validation True hits