280 likes | 290 Views
DockCrunch and Beyond... The future of receptor-based virtual screening. Bohdan Waszkowycz, Tim Perkins & Jin Li Protherics Molecular Design Ltd Macclesfield, UK. Outline. Structure-based virtual screening an achievable (and possibly useful) tool for drug discovery
E N D
DockCrunch and Beyond...The future of receptor-based virtual screening Bohdan Waszkowycz, Tim Perkins & Jin Li Protherics Molecular Design LtdMacclesfield, UK
Outline • Structure-based virtual screening • an achievable (and possibly useful) tool for drug discovery • the DockCrunch validation study • Protherics’ experience since DockCrunch • methods: making VS a routine task • analysis: getting the most from your data • the future (and beyond)
receptor structure molecular docking Virtual Screening compound collections virtual libraries computational screening targeted selection screen smaller focused libraries
Why Use Molecular Docking? • Most detailed representation of binding site • overcomes simplifications of pharmacophores • identify both conservative and novel solutions • impetus for de novo design/optimisation • Broad range of analyses applicable • diverse scoring/selection criteria • Quality/throughput of available methods • good enough, despite technical limitations
DockCrunch • Validation study for large-scale virtual screening • flexible ligand/rigid receptor docking • PRO_LEADS docking code using ChemScore scoring function • 1.1M druglike ACD-SC compounds • dock versus oestrogen receptor (agonist and antagonist structures) • collaboration with SGI
Agonist Receptor Antagonist receptor DockedEnergy Profiles • Achieve good separation in terms of predicted binding affinity
DockCrunch Results • Demonstrated technical feasibility • 1.1M cpds docked in 6 days/64 processor Origin • implemented automated pre- and post-processing • Demonstrated potential for lead identification • successful discrimination of seeded known hits • activity for 21 out of 37 assayed compounds • ER binding affinities to 7nM Ki • novel non-steroidal chemistries
Since DockCrunch... • VS established as a routine CAMD task: • 2.2M structures docked in DockCrunch • 1.5M docked versus in-house target • 2.5M docked to date in external contracts • project 1: 0.25M Dec 2000 • project 2: 0.25M Jan 2001 • project 3: 1M Feb 2001 • project 4: 1M March-April 2001 • project 5: 0.5M to do in May... • diverse targets/databases/project objectives
Virtual Screening within Prometheus Database preparation e.g. salt removal, protonation Virtual databases Commercial databases Database pre-filtering select drug-like profile Receptor structure Receptor-ligand docking predict binding mode/affinity Analysis graphical browsing, subset selection
PRO_LEADS Docking • Tabu search + extended ChemScore function • robust prediction of binding free energy • 85% success rate achieved across diverse test set • Pre-calculated grids for energies/neighbour lists • defines extent of binding site • automatically/graphically defined • Selection of PRO_LEADS docking protocol • use standard protocol across all receptors • specific constraints or modified energy terms available if desired
Example of Grid Definition cAMP-dependent kinase (1YDS) contact surface coloured by lipophilicity
Docking Throughput • Standard protocols take 1–5 mins/ligand • e.g. typical VS run at ~4 min for 3M tabu steps • 250k cpds/week on 100 processor Linux cluster (VA Linux 750MHz PIII) • PLUNDER script for parallelization • automatic processing of ligand batches • balances processor workload • works across heterogeneous architectures • supplies running time statistics • handles hardware failures
Data Analysis and Subset Selection • Intrinsic problems of scoring functions: • cannot parameterize all critical interactions • try to take account of induced fit effects • calibrated only versus good binders • ignore co-operativity in binding • When applied to random datasets: • predicted affinity typically normal distributed • overestimates binding affinity of random set energy alone not ideal for subset selection
Achieving Better Selection • Need to supplement scoring function • consensus scoring schemes • Explore more fundamental descriptors of receptor:ligand complementarity • capture characteristics of diverse receptor types • assess deficiencies of existing scoring functions • use as simple filters or as pseudo energy terms
Enrichment RatesEffect of different selection criteria for ER set for recovery of seeded compounds
Requirements for Analysis Package • VS generates huge data output • want to be able to browse through entire dataset • Real-time navigation of large datasets • graphing property distributions • selections based on property filters • browsing of 3D models within selections • initiating additional property calculations • data transformations • writing subset/reports
Approach to Analysis • 1. Preliminary exploration • browse property distributions • comparisons with known ligands • 2. Initial elimination of poor structures • DockedEnergy, component energies • DE corrected for size/functionality • receptor:ligand steric complementarity • polar/lipophilic surface complementarity
Approach to Analysis • 3. Further filtering define focused subsets • tighter 2D property filters • clustering by 2D chemistry • presence of key 3D binding interactions • specific H-bonds, specific lipo contacts, pocket occupancy, volume overlap with reference ligand/fragment, etc • similarity/diversity of 3D binding mode • 3D similarity descriptors • final ranking by DockedEnergy or hybrid energy/complementarity scoring function
Addressing More Difficult Cases - COX2 Knowns show clustering in property space despite modest DockedEnergy
Improvements in Docking Function original docking function some misdocked knowns new docking function more consistent docking +ve shift in random energies
Comparison of filters in subset selection 87% pass 2D filters 37% pass energy filters • Initial filtering to ~10% • energy filters • complementarity • 2D properties • Selection of final ~1% subset • 3D structural features • preferred binding motifs • 2D/3D diversity 43% 22% 2% 12% 1% 9% 0% 22% pass complementarity filters
Conclusions • Established VS as a routine CAMD task • focused software development • achieved success in drug discovery projects • VS is more than a black box • data mining is worthwhile • explore receptor-ligand complementarity to achieve good subset selection and point towards better scoring functions
Future Directions for VS • Exploit expanding computing resource • improved docking/scoring functions • improved receptor representations • Broader application of VS • evaluation of drugability of early targets • screening of very large virtual libraries • routine screening across protein families • DMPK issues
Tim Perkins Martin Harrison Richard Sykes Carol Baxter Richard Hall Chris Murray David Frenkel Jin Li David Sheppard Thanks to: SGI, MSI, MDL, VA Linux http://www.protherics.com/crunch/ Acknowledgements