180 likes | 313 Views
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine. Ajay N. Jain UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California Presentation by Susan Tang CS 379a January 23, 2006.
E N D
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California Presentation by Susan Tang CS 379a January 23, 2006
Protein-Ligand Docking Overview Goal - To predict how well a given set of ligands will bind to a protein structure - To predict the structure of bound protein-ligand complexes Components - Search method: explore different ways that ligand can interact/fit with protein - Scoring function: assign a quantitative value to each ligand/protein fit
Protein-Ligand Docking Overview Criteria 1) Docking accuracy Measures ability to find a conformation + alignment (pose) of a protein-ligand that is close to reality 2) Scoring accuracy Ability to rank a correct pose of a molecule higher than an incorrect one 3) Screening utility Ability to identify only true ligands in a set that contains false positives 4) Speed How fast the algorithm can screen a library of ligands
Surflex: A new docking methodology • Combines Hammerhead’s empirical scoring function with a molecular similarity method to generate putative poses of ligand fragments • Like Hammerhead, Surflex has 1 mode that uses an incremental construction search approach. But Surflex also has another mode: a whole molecule approach that is faster/more accurate • Surflex is designed primarily as a screening tool for small molecule libraries
Surflex: Computational Design • Protomol Generation First create an ideal active site ligand from the protein structure of interest Input: (a) protein structure (b) list of residues to identify protein active site Output: A protomol, or target to which potential ligands or ligand fragments are aligned based on molecular similarity Procedure: Molecular fragments are put into the protein binding site in multiple positions optimized for interaction with protein select high-scoring nonredundant fragments protomol formation
Surflex: Computational Design • Protomol for streptavidin compared with the native pose of biotin (green) • The bond being pointed to is broken by Surflex to make fragments of biotin for docking.
Surflex: Computational Design • Docking Ligands are docked into the protein to optimize scoring function Input: (a) protein structure, (b) protomol, (c) ligand(s) Output: The optimized poses of docked ligands along with corresponding scores Procedure: Divide input ligand into 1-10 molecular fragments search each fragment in terms of conformation each conformation of each fragment is aligned to protomol to get poses with maximum molecular similarity to protomol score aligned fragments and keep those with highest score and minimal protein interpenetration construct full ligand molecule from the aligned fragments using either an incremental construction approach or whole molecule approach highest scoring poses undergo further refinement of conformation and alignment
Surflex: Computational Design Incremental Construction vs. Whole Molecule Algorithm Incremental Construction - Makes strong assumption that maximizing the similarity of tiny fragments to the protomol will generate good poses Whole Molecule Algorithm - bypasses the strong independence assumption made in incremental construction - “dead” pieces are carried with the “live” piece during conformation search - when creating putative poses to protomol, the “dead” pieces in their arbitrary initial conformation are carried into the molecular similarity computation eliminate those with worst protein interpenetration - for remaining poses, score on basis of individual fragments - recursive search yields whole molecules that consist of fragments selected from different docked poses - these whole molecules score well in total, over all fragments
Surflex: Computational Design • Illustrates the process of docking biotin to streptavidin (blue) • Gray indicates the “live” fragment • Magenta indicates the “dead” fragment • Green lines show the result of merging the two well-docked fragments at the atoms indicated by yellow circles • The merged pose closely follows the parent fragments’ original configurations
Surflex: Evaluation • Evaluation of reliability and accuracy of dockings - Comparison with experimental results on 81 protein/ligand pairs - The pairs were selected to represent structural diversity • Evaluation of Surflex’s utility as a screening tool • Performed on 2 protein targets (thymidine kinase and estrogen receptor) • Competing docking methods were tested side by side using the same data set for comparison purposes (GOLD, Dock, FlexX) • Evaluation of the Surflex’s docking speed - Investigate relationship between docking time and # of rotatable bonds
Surflex: EvaluationData Set Construction 134 protein-ligand Complexes* 81 protein-ligand complexes filter Filtering Criteria: • 15 or fewer rotatable bonds Most small molecules have <= 15 rotable bonds • no covalent attachments between ligand and protein Since Surflex’s scoring function was developed strictly on noncovalent complexes • ligands with no obvious errors in structure Undesirable to modify an existing protein-ligand complex prior to testing * data set used for GOLD docking program
Surflex: EvaluationResults 1)Evaluation of reliability and accuracy of dockings Describes how thorough the search procedure is and to what extent scoring function can recognize good dockings • Surflex returned a pose within 2.5 angstroms rmsd (94 % of cases) • Surflex returned a BEST scoring pose that was within 2.5 angstroms (86 % of cases) • With a single docking from a random initial pose, chances of finding a correct or nearly correct pose is averaged to be ~70 %
Surflex: EvaluationResults 2) Evaluation of Surflex’s utility as a screening tool Tests ability of program to detect true positives against a background of random molecules (sensitivity vs. specificity) • Surflex had a True Positive rate of > 80% at a False Positive rate of < 1 % • Surflex had the best performance (lowest FP rate for a given TP rate) out of the different individual and combined methods assayed
Surflex: EvaluationResults 3) Evaluation of the Surflex’s docking speed Docking speed becomes very important in screening large compound libraries. • Surflex demonstrated a docking time that was approx. linear in number of rotatable bonds • Rigid molecules took a few seconds and each additional rotatable bond took an additional ~10 seconds • Surflex yielded a mean running time of 44 seconds for the 81 protein-ligands in the test set used earlier • Docking speed ranges from 50-100 seconds per molecule for FlexX, DOCK, and GOLD (Surflex speed is comparable to these times) • Quantitative comparison across methods is difficult due to differences in hardware and methodology
Conclusions • Surflex marks a step forward in flexible molecular docking programs • Compared to the best docking methods available, Surflex is: • as fast • as accurate in terms of docked ligand RMSD • much more accurate in terms of scoring • Assaying the top scoring 1% of compounds in the screening library should yield a large proportion of true positives • Potential areas of improvement - scoring and penetration terms should be combined into a single score - scoring function should include training on non-binding ligands (negative examples) - effect of nonbonded self-interactions within ligands should be accounted for explicitly - allow a degree of protein flexibility (side chain movement)