Structure-Based Virtual Screening: New methods, Old Problems and “Ancient” Solutions

Structure-Based Virtual Screening:New methods, Old Problems and “Ancient” Solutions • Structure-Based Virtual Screening (SVS) is a proven technique for lead discovery • Still significant room for improvement • Efforts generally focused on the creation of novel scoring functions • In this presentation • Present a novel technique for scoring function development • Highlight problems encountered • Illustrate the potential of pharmacophore constraints to mitigate some of these issues • Analyze implications for current and future SVS technology

Scoring Functions: Development Within an SVS algorithm Framework • We are interested in the top ranking molecules from SVS • Do not care about the nature of the score itself • Alternative strategy - design function around optimization of active molecule rank • Accomplished using docking data from selected SVS algorithms • No binding data required • Complexed ligand should top the rank list • Allows metrics that describe reasons for lack of binding • High scoring docked inactives • Effect of docking algorithm limitations can be better understood • Optimizing within the framework to be used for SVS calculations

Data Set Selection: Filtering • Initial set of ~300 complexes extracted from the work of Böhm, Keske and Dixon • Gschwend et al. J. Mol. Recognit. 1996, 9, 175-186. • Böhm. J. Comput.-Aided Mol. Design 1994, 8, 243-256. • To ensure diversity and quality, filters applied: • Discard complexes with: More than 50 heavy atoms / Resolution>2.5Å / covalently bound / incompletely modeled • Data weighted towards specific targets - many close analogues • If a general scoring function is required, these need to be filtered • Initial efforts were set on removing all repeat targets. • Too drastic - multiple complexes of the same target kept as long as ligand represented a unique chemotype (no analogues)

Data Set Selection:Unexpected Oddities • Odd interactions brought about by extreme crystallization conditions • 1rnt - acidic crystallization conditions (pH5.0) produce unusual protonation state • Multiple points of crystal contact with symmetry related molecules • 4gr1 has more interactions with symmetry related protein than deposited structure • An extreme case, but problem significant enough for inclusion in Relibase: http://www.ccdc.cam.ac.uk/news/14_12_01.html

Scoring Function Development :Data Set Selection - the Final Tally • Once all filters applied > 75% of complexes removed • Highlights significant problems in generating a clean data set • An under-appreciated problem in scoring function development • The need to analyze and exploit all available PDB data including the most recently deposited structures • Requires much manual intervention • Poster 110 Sadowski et al. • Poster 251 by Fenu et al. • Final selections - 20 training set and 10 test set complexes

Scoring Function Development : Basic Strategy DOCK (4.0) Ligand data set into each active site (“active” ligand + molecular noise) Feed all docked orientations into metric generator Use a GA and stored metric data to simultaneously optimize rank of “active” orientations in each target site

Scoring function Development:GA Implementation • GA optimizes average rank of the “active” orientations within a data set of docked molecules and targets. Score = a*metric1 + b*metric2+ ...

Scoring function Development:Tests run • Three primary experiments undertaken: • Optimize rank using crystallographic ligand orientation (CLO) study • Replace CLO with orientation produced on reDOCKing ligand binding conformer into target site - closest docked orientation (CDO) study • Compare results with standard DOCK scoring functions (contact / force field) “Typical” CDO orientation compared to CLO binding mode for 7est. Heavy atom RMS=1.56Å

Scoring function Development: CLO test • High ranking in both training and test sets (4/24000 - 37/22000) • Clash descriptor scores highly • CLASH weights against ligand protein bumps • Rare for CLO • More common in DOCK orientations. • Effectively acts as an indicator variable 37 4

Scoring function Development: CDO test • 4 training set and 1 test set compound unable to dock within 2.0Å RMS of CLO • Removed from analysis • Test results look less impressive • Due to docking inaccuracies • H bond network breakdown • Clash term importance drops significantly now, as CDO, unlike CLO often contains bumps 2476 51

Scoring function Development: Average test set RankComparisons • CDO orientations in CLO scoring function Average rank = 19337 CLO orientations in CDO scoring function Average rank = 75 • CDO performance more robust • Due to a reduction in sensitivity to steric clashes • CDO orientations and DOCK contact score average rank = 2690 CDO orientations and DOCK force field score average rank = 16518 • All atom model and R12 repulsion oversensitive to clashes • Contact score user controlled steric clash penalty permits sensitivity control • Comparison of CDO and contact score shows a slight improvement average ranks = 2476 / 2690 • H bond/electrostatics adding some additional resolution

Scoring function Development: Conclusions • Results highlight potential pitfalls in scoring function design • More robust data sets required (c**p in - c**p out ) • Xtal data performance not necessarily representative of real world SVS • CLO scoring function • High resolution descriptors are not always compatible with binding modes of 1.0-2.0Å accuracy often seen at current sampling levels • H bond net work breakdown even with near-hit binding modes • Need to consider alternative scoring metrics • lower resolution descriptors / non-binding event measures • Scoring and sampling are not separable problems • To take scoring functions to the next level need to focus on SVS technology with more exhaustive sampling paradigms • Additional CPU essential: Distributed (grid-based) computing

Exploiting an old Trick:SVS and Pharmacophore constraints • Another major scoring function failing • Inability to differentiate H bond/Salt bridge strengths • H bonds often measured by presence or absence • Salt bridges despite there importance are often ignored • SVS searches are generally undertaken with a binding hypothesis in mind • Exploitation of known target structural biology • Scoring functions often struggle to incorporate such information • Pharmacophore constraints provide a sampling-based alternative paradigm to mitigate these issues

Pharmacophoric Constraints: DOCK Chemical Matching and Critical Regionshttp://www.cmpharm.ucsf.edu/kuntz/dock4/html/Manual.47.html#pgfId=20180 Sample acid site point definitions # acyl sulphonamide definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) ) # deprotonated carboxyl definition O.co2 ( C ) • DOCK permits creation of user defined pharmacophore elements • When combined with critical regions, DOCK can simultaneously undertake 1000’s of binding site constrained pharmacophore searches In-house DOCK pharmacophore types: heavy atom donor acceptor hydrophobe aromatic aromatic_hydrophobic acid base donor_and_acceptor special (e.g. metal chelator) Region 3 Hydrophobic Region 1 + 2 acceptor / donor Sample Kinase site definition

Pharmacophoric Constraints:Comparison Test sets • 5 Targets analyzed • ~10000 noise molecules plus active compound data set docked into each active site • Enrichment analysis based on chemotype rather than headline hit rate to prevent active analogue bias

Averaged Chemotype Enrichments • Constrained contact search enrichment stands out • Force field performance limited by aforementioned over-sensitivity to steric clashes

Searches Across Different SVS Paradigms: Kinase Pocket • Performance improves as scoring function simplified • Prometheus in particular led astray by spurious h bonds • Flexible site / inactivated form • Challenging target • Constrained contact score performs best • Unable to implement Constraints in Prometheus and GOLD

Pharmacophoric Constraints:Conclusions • Pharmacophores offer numerous attractive features in SVS • Improved hit rates • Binding orientations constrained by user hypotheses to biologically relevant regions of space • known structural biology • For algorithms such as DOCK, large increases in search speed (typically 1-2 orders of magnitude) • Simple scoring functions still have a role to play in SVS • more tolerance to errors in binding mode and limitations in active site resolution

Acknowledgements • Thank you to GA scoring function design Ryan Smith Dan Gschwend Andrew Leach Rod Hubbard Pharmacophore searching Tim Perkins Dan Cheney Doree Sitkoff John Tokarski Yi Li Jonathan Mason and all my other BMS colleagues past and present • TEC / GA source available to all interested parties andrew.good@bms.com

Searches Across Different SVS Paradigms: Generic vs Constrained(*) Searches • FAB protein 2 well defined rigid pocket • Good SVS target • All methods perform well • high percentage of chemotypes found • In all cases constrained search outperforms its generic equivalent

Structure-Based Virtual Screening: New methods, Old Problems and “Ancient” Solutions