120 likes | 185 Views
D. Kozakov 1 , K. H. Clodfelter 2 , C. J. Camacho 1,3 , and S. Vajda 1,2 1 Department of Biomedical Engineering 2 Program in Bioinformatics, Boston University 3 Current address: University of Pittsburgh.
E N D
D. Kozakov1, K. H. Clodfelter2, C. J. Camacho1,3, and S. Vajda1,2 1Department of Biomedical Engineering 2Program in Bioinformatics, Boston University 3Current address: University of Pittsburgh Discrimination of near-native structures by clustering docked conformations and the selection of the optimal radius
Why do we need clustering? • Rigid body docking methods sample a large set of conformations which uniformly cover the energy landscape • Energy scoring functions are not enough to discriminate between near native structures • unbound crystal structure conformations are not the same as when in solvent • difficulty in estimating the solvation effects • Distribution of sampled conformations in such cases has more information than single conformations alone
What clustering means for docking? • Low energy conformations below a given threshold will cluster • Clusters are representative of the energy minima • The cluster in the native funnel should be the most populated
How to describe clustering property? • Δ characterize intra- to inter- cluster elements ratio • Δ=1 Data set well separated • Δ=0 No clustering • Δ>Δn Distribution carries cluster size information • Optimal Radius (OR): First minimum with the largest Δ
Clustering Procedure • Element with maximum number of neighbors is chosen. It is called the cluster centre. • All the elements within the optimal radius are included in the cluster. • Exclude these elements and repeat until all points are exhausted. • Redistribute the elements to their closest cluster centre. • Rank the clusters based on size. • Clusters with a size less than 10 are ignored.
Application to Docking • Rigid body methods uniformly sample the placement of the ligand around a fixed receptor • Best conformations are chosen based on shape complementarities and a simple energy scoring • The total set of conformations considered is 2000-20,000 in size • We choose N of the lowest energy desolvation (ACP) conformations and 3N of the lowest electrostatic energy conformations (N = 50-500) • A distance of 6-9 Å is the characteristic size of attractors from these potentials
How does docking histograms look like? • OR measure – property of sampled energy landscape
Results • Tested on the benchmark set of protein complexes • Hit is rank of first best cluster with center within a distance of 10 Å RMSD from native bound conformation • “Biggest cluster = native funnel” is supported • Clusters – starting points for further refinement
Acknowledgments • Sandor Vajda • Carlos Camacho