1 / 22

Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols

Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols. Structure-Based Virtual Screening (SBVS) is a proven technique for lead discovery Still many areas for improvement Efforts generally focussed on scoring function

maddox
Download Presentation

Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols • Structure-Based Virtual Screening (SBVS) is a proven technique for lead discovery • Still many areas for improvement • Efforts generally focussed on scoring function • Often with little consideration of the assumptions underpinning SBVS • Here we consider a number of these processes in detail from the perspective of our primary SBVS tool (DOCK) • Ligand conformational search protocols • Varying site points definitions • Alteration of DOCK variables that directly affect sampling • Determine their impact on hit enrichment and search speed • Analyze implications for future research

  2. Ligand Flexibility StudiesStrategy • SBVS CPU intensive • Conformational searching of ligand clearly important • Sampling limited to allow search completion in reasonable time frame • Test required to compare different conformational sampling methods • Ability to reproduce bioactive conformation tested • 145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF unpublished) • 30 compound subset chosen for analysis- selection based on visual and numerical inspection of diversity in ligand flexibility and functionality • Relatively small sample of molecules used, many peptidic in nature • Peptidic moieties are among the better parameterized systems, so this is in some ways a best case scenario

  3. Ligand Flexibility StudiesProcedure • Multiple sampling techniques chosen: Catalyst-best / Catalyst-fast / Confort / Omega / DOCK • Variety of sampling levels • Starting from Concord structure, conformers generated and superimposed onto pdb ligand conformation. • Conformation with lowest heavy atom RMS to used as quality measure

  4. Ligand Flexibility StudiesSearch Settings Employed • Dock - conformation_cutoff_factor=3/5/10 clash_overlap=0.7 times vdW radius for clash overlap with customized rules for bond increment settings • Confort - Rough (0.10 kcal) convergence, diverse conformer selection, boat ring search on - sampling at 5/10 confs per single bond + 500 max • Catalyst- Best/Fast Default settings - sampling at 5/10 confs per single bond + 100 max • Omega: Defaults +RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, sampling at 100 max • In addition Concord generated and Sybyl minimized ligand xray structures also analyzed as “controls”

  5. Ligand Flexibility Results Overall Performance - RMS/ Rank

  6. Ligand Flexibility ResultsPerformance vs Flexibility

  7. Ligand Flexibility Results The Pain Gain Ratio • Does extra noise introduced to scoring functions outweigh this improvement? Is it worth the extra CPU?

  8. Ligand Flexibility ResultsVisual Analysis RMS=0.90 RMS=0.65 • Even at lower RMS, deviation in hydrogen positions an issue • As RMS rises (0.9) we begin to see more significant deviations in heavy atom positions - large enough to possibly prove troublesome to standard force fields

  9. Ligand Flexibility ResultsVisual Analysis RMS=2.19 RMS=1.55 • As RMS rises further, hydrogen bond mapping begins to partially break down • Significant deviation begins to be seen although general shape complementarity is still reasonable • DOCKing tricky, pharmacophore searches possible with loose tolerances, although site point vector definitions (DISCO / Catalyst) a no no

  10. Ligand FlexibilityConclusions • At current sampling levels used in virtual screening • Rough search techniques perform comparably to more exhaustive methods • Dock performs quite well, and Fast does slightly better than comparable Best run • Results highlight the need for “forgiving” scoring functions and pharmacophore constraint tolerances (especially for flexible molecules) • Generating function directly from crystal structure data may not be optimum • Use the conformation closest to the biologically relevant structure with chosen sampling technique • May be better to ignore more flexible molecules when possible (~>8 bonds) • Analysis of more extensive data set might provide basis for determining if optimum sampling settings exist (Best/Omega/Confort) • Coarseness of poling values for example

  11. Structure-Based Search ProtocolsAn Analysis of DOCK • Working within current DOCK paradigm, what search protocols provide optimum search criterion? • Site point definitions • Alteration of sampling variables • Different scoring grids • Comparisons illustrated for 5 test systems with diverse active data sets • Analysis based on ranking within list that includes ~10000 “noise” compounds • “Random” selection within bounds of size and flexibility distribution seen in in-house database

  12. Structure-Based Search ProtocolsDOCK variables • Contains many variables that effect performance • Ligand sampling within the site being the primary variant nodes 3/4 distance_tolerance 0.5/1.0 distance_minimum 3.0 bump_filter 4 conformation_cutoff_factor 5 clash_overlap 0.7 maximum_orientations 500/5000

  13. Structure-Based Search ProtocolsDOCK and pharmacophoric constraints • It is possible to assign fairly sophisticated pharmacophoric (henceforth also known as chemical) definitions Current types: heavy atom donor acceptor hydrophobe aromatic aromatic_hydrophobic acid base donor_and_acceptor special (e.g. metal chelator) name acid # deprotonated carboxyl definition O.co2 ( C ) # tetrazole definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) ) definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) ) definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) ) definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) ) definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) ) definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) ) definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) ) # acyl sulphonamide definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) ) definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) ) definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )

  14. Structure-Based Search ProtocolsSite Points Used in Kinase Search Region 1 ( + 4) acceptor / donor Region 2 Hydrophobic + 2 donors Region 3 Hydrophobic / Any heavy atom

  15. Structure-Based Search ProtocolsTest Sets and Site Points Used • Sphgen used to generate site points for “generic” DOCK searches • Pharmacophore points derived from a mixture of non-data set bound ligands and in-house programs that process GRID maps and Connolly surfaces (plus plenty of human intervention) • Active data sets broken down into chemotypes to prevent the problem of common analogue bias - an under appreciated issue in all validations

  16. Results - fatty acid binding protein 1 No. of hits after 7 chemotypes located by at least one search ( 500 compounds processed from 28 actives / 8 chemotypes) • Missing chemotype a citrazinate - not covered in chemical definitions - easy to fix - another advantage over electrostatics

  17. Addition of critical region constraint alone worsens results • 500 orientations per conformer too few for search - leads to premature termination of docking analysis for many ligands • Generic searches with addition of conformational flexibility little improvement relative to rigid search • signal to noise issues • Adding chemical in addition to critical constraints provides best balance for sampling parameters • still required reasonable tolerances and forgiving scoring function for optimum results • Rigid conformer screens perform quite well in generic search mode • One system contains predominantly rigid chemotypes, two others require a predominantly extended conformation for binding • On addition of critical and chemical constraints, inability of rigid search to adapt to more exacting requirements severely compromises results Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems

  18. ResultsSample Hit Rate Comparisons • Kinase sites tend to be highly mobile • Forgiving DOCK scoring function more appropriate • Fatty acid active site deep and fairly rigid • Prometheus at least comparable performance to DOCK even with more simplistic constraints

  19. ResultsSample Hit Rate Comparisons • Illustrates how addition of constraints can allow performance of simplistic scoring functions to surpass those deemed more sophisticated

  20. ResultsSample Hit Rate Comparisons • Removing highly flexible molecules from the search reduces the noise at the top of the hit list • In a database of 250000, the top 100 becomes top 2500 • Could be crucial when only small data sets can be assayed • Smaller molecules generally make better leads

  21. ConclusionsThe hypothesis hypothesis • Sampling choices have a profound effect on SBVS results • For maximum impact impact current methodology, scoring functions should either • Be designed/utilized with these limitations in mind • Forgiving / targeted at less flexible molecules • Improve results by such a high degree that additional sampling (and CPU) is warranted • In the mean time, utility of pharmacophoric hypotheses {critical region(s) with pharmacophoric constraints} is clear • Better results faster • Less sensitivity to model coarseness • Allows constraints exploiting known structural biology • Key to optimum use is balancing constraints and tolerances to ensure sufficient sampling • benchmarking with known ligands one way to do this

  22. Acknowledgements • Thank you to my BMS CADD colleagues

More Related