350 likes | 590 Views
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS. Douglas R. Houston, University of Edinburgh. Introduction to Virtual Screening. Virtual High-Throughput Screening Attempts to simulate the assay plate well in silico Models the interaction between small molecule and protein receptor
E N D
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS Douglas R. Houston, University of Edinburgh
Introduction to Virtual Screening • Virtual High-Throughput Screening • Attempts to simulate the assay plate well in silico • Models the interaction between small molecule and protein receptor • Pros - Fast and cheap (compared to HTS) • Cons - Absolute requirement for certain data
Requirements for vHTS • DATA • Receptor and/or ligand structure(s) - both is best • If no ligand is present in the structure then active site must be located • Chemical database - which compounds? • SOFTWARE • Massive choice, but almost everything available has been independently benchmarked • HARDWARE • Linux cluster has become most cost-effective • WORKFLOW • Most difficult to determine optimal choice
CODASS • COmbining Docking And Similarity Searching • Utilises both major classes of in silicoligand discovery: • Structure-based methods • Docking • Ligand-based methods • Topology • Pharmacophoric • Descriptor RECEPTOR
Requirements for vHTS • DATA • Receptor and/or ligand structure(s) - both is best • If no ligand is present in the structure then active site must be located • Chemical database - which compounds?
CODASS data • Receptor structure • Ligand structure • Ligand binding conformation • Compound database
EDULISS Compound Database • Estimated to be 1060 “drug-like” molecules possible • Total compounds synthesised to date estimated to be ~50m • Combining the databases of several compound suppliers results in a total of ~5m compounds • Are so many compounds necessary? • Only a small proportion of chemical space has properties that: • Allow them to bind to biological targets • Are likely to be orally bioavailable • Are consistent with good pharmacokinetics • Will not give rise to toxicity
Enriching EDULISS • Rule of n • Lipinski rule of 5 • Oprea rule of 4 • Astex rule of 3 • All filter according to the same physicochemical properties, e.g. • H-bond donors (NH, OH) • H-bond acceptors (Heteroatoms) • cLogP • MW
Enriching EDULISS • Selected filters: Oprea “lead-like” • H-bond acceptors < 9 • H-bond donors < 6 • MW 200 – 480 • cLogP -4 – 4.2 • cLogS > -5 • Rotatable bonds < 11 • 1,000,000 “virtual compounds” can be screened in ~10 days with current hardware 5,000,000 1,000,000
Requirements for vHTS • Hardware • Linux cluster has become most cost-effective • Software • Massive choice, but almost everything available has been independently benchmarked
CODASS Hardware and Software • Hardware • 114-core Linux cluster • Software • Docking/scoring • LIDAEUS • Autodock • Vina • Scoring • X-Score • DSX • Pharmacophoric searching • UFSRAT • ROCS • Topological searching • Wiener index
Simulating protein-ligand interaction • Two separate problems: • Docking • Can the docking program find the correct ligand binding pose? • Scoring • Can the program’s scoring algorithm judge correctly which ligands bind more tightly?
Can docking software predict ligand binding conformation? • PDBbind database contains 3,214 crystal structures of proteins in complex with ligands with known affinities • 228 high-quality structures from this form the “core set” • PDBbind core set • 65 protein targets each in complex with a high- medium- and low-affinity ligand • An additional 33 protein targets each in complex with one ligand
Can docking software predict ligand binding conformation? • Results: • In 122 cases, Autodock’s prediction of the binding pose was correct (RMSD < 2.0 Å) - 54% • In141 cases, Vina’s prediction of the binding pose was correct (RMSD < 2.0 A) - 62% • In those cases where RMSD < 2 Å between the Autodock and Vina prediction, both predictions were correct in 82% of cases • Conclusion: Only those compounds where the binding pose predictions of Autodock and Vina agree should be considered • “Posematch” can be introduced into vHTS workflow Structure of RNA-dependent RNA polymerase in complex with a thiophene-based non-nucleoside inhibitor (PDB ID: 2D3U) Structure of O-GlcNAcase in complex with the inhibitor NButGT (PDB ID: 2VVS)
Can docking/scoring software predict ligand binding affinity? • Vina and Autodock both attempt to predict affinity • X-Score and Drugscore are standalone scoring algorithms • How well do the predictions made by these programs correlate with measured affinities? • Correct docking poses were ranked by the various algorithms • Predicted rankings were compared to known rank Spearman’s rank correlation coefficient
Can docking/scoring software predict ligand binding affinity? = 1 Perfectly correlated • = 0 • Perfectly uncorrelated • Results: • Autodock: 0.57 • Vina: 0.66 • X-Score: 0.71 • DSX: 0.64 • = -1 • Perfectly anticorrelated
Can docking/scoring software predict ligand binding affinity? • Autodock: 0.57 • X-Score: 0.74
Can docking/scoring software predict ligand binding affinity? • Results • Autodock: 0.57 • Vina: 0.66 • X-Score: 0.74 • DSX: 0.64 Comparative assessment of scoring functions on a diverse test set. Cheng T, et. al. Chem Inf Model. 2009 Apr;49(4):1079-93.
Can docking/scoring software predict ligand binding affinity? • Results • Autodock: 0.57 • Vina: 0.66 • X-Score: 0.74 • DSX: 0.64 Comparative assessment of scoring functions on a diverse test set. Cheng T, et. al. Chem Inf Model. 2009 Apr;49(4):1079-93.
Requirements for vHTS • Workflow • Most difficult to determine optimal choice
Which workflow is optimal? • Quality of results is dependent on: • Which programs are selected • The order in which the programs are used • Program selection and sequence depends on the starting data available • Receptor structure? • Ligands? • Ligand binding location? • Ligand binding conformation?
Example 1 - Ligands known but no target structure Ligand-based methods only Generate conformers 5m CombiSearch Cluster EDULISS A battery of similarity search algorithms creates a database tailored to your target Searchable database of 5m commercially available 3D compounds Visual analysis Purchase & test
Example 2 - Target structure available but no ligands Structure-based methods only Parallelised rigid-body docking; screen millions of compounds in hours 25m Generate conformers LIDAEUS 0.5m Parallelised fast flexible docking 5k Vina Parallelised rigorous flexible docking models electrostatics 100k 100 5m Autodock CombiSearch Binding modes are compared - matches are 90% likely to be correct PoseMatch EDULISS A battery of topological and pharmacophoric search algorithms creates a database tailored to your target X-Score + DSX Docked poses are graded using best scoring algorithms currently available Searchable database of 5m commercially available 3D compounds Visual analysis Purchase & test
Example 3 - Target structure and ligands available Both ligand- and structure-based methods Parallelised rigid-body docking; screen millions of compounds in hours 25m Generate conformers LIDAEUS 0.5m Parallelised fast flexible docking 5k Vina 100 Parallelised rigorous flexible docking models electrostatics 100k 5m Autodock • Pharmaco-phore search 100k Binding modes are compared - matches are 90% likely to be correct PoseMatch EDULISS X-Score + DSX Docked poses are graded using best scoring algorithms currently available • Topological • search Searchable database of 5m commercially available 3D compounds Visual analysis LIGAND STRUCTURES Purchase & test
Example 4 - Target structure available and ligand binding conformation known Both ligand- and structure-based methods Parallelised rigid-body docking; screen millions of compounds in hours 25m Generate conformers LIDAEUS 0.5m Parallelised fast flexible docking 5k Vina 100 Parallelised rigorous flexible docking models electrostatics 100k 5m Autodock • Pharmaco-phore search 100k Binding modes are compared - matches are 82% likely to be correct PoseMatch EDULISS X-Score + DSX Docked poses are graded using best scoring algorithms currently available • Pharmacophore • search Searchable database of 5m commercially available 3D compounds Visual analysis LIGAND CONFORMATIONS Purchase & test
CODASS vHTS against PYK • Human PYK is inactivated by T3 or phenylalanine • Crystals structure shows that Phe binds in pocket distinct from known active or effector sites
PYK “receptor” structure available, plus structure and binding conformation of Pheligand Parallelised rigid-body docking; screen millions of compounds in hours 25m Generate conformers LIDAEUS 0.5m Parallelised fast flexible docking 5k Vina 100 Parallelised rigorous flexible docking models electrostatics 100k 5m Autodock • Pharmaco-phore search 100k Binding modes are compared - matches are 90% likely to be correct PoseMatch EDULISS X-Score + DSX Docked poses are graded using best scoring algorithms currently available • Pharmacophore • search Searchable database of 5m commercially available 3D compounds Visual analysis Purchase & test
PYK “receptor” structure available, plus structure and binding conformation of Pheligand Parallelised rigid-body docking; screen millions of compounds in hours 25m Generate conformers LIDAEUS 0.5m Parallelised fast flexible docking 5k Vina 100 Parallelised rigorous flexible docking models electrostatics 100k 5m Autodock • Pharmaco-phore search 100k Binding modes are compared - matches are 90% likely to be correct PoseMatch EDULISS X-Score + DSX Docked poses are graded using best scoring algorithms currently available • Pharmacophore • search Searchable database of 5m commercially available 3D compounds Visual analysis 1000-2000 “virtual hits” Purchase & test
CODASS vHTS against PYK • Creating a post-screening pharmacophoric filter
CODASS vHTS against PYK Summary: • 5 million commercially available compounds were docked into a known allosteric pocket of PYK • Lists of predicted binders were collated by merging multiple affinity prediction methods • 20-30 compounds will be acquired and tested in assay • What hit rate can we expect?
CODASS hit rates • Virtual screening success rates using in-house tools: • Average hit rate: 49% • Compare to 1% and 5% “industry standard” for HTS and vHTS
Conclusion • CODASS can offer a cost-effective alternative to High-Throughput Screening for inhibitor discovery • CODASS is applicable to any target enzyme for which the structure is known
Acknowledgements • Dr. Hugh Morgan • Dr. Steve Shave • Dr. Paul Taylor • Prof. Malcolm Walkinshaw