200 likes | 232 Views
Explore different methods for aligning protein structures, including translation, rotation, and scaling. Learn how to measure alignment errors using distance functions like RMSD and bottleneck metric. Discover algorithms for finding optimal transformations and subsets of points for maximal congruence. Understand the quality assessment of alignment solutions through computational complexity analysis.
E N D
Sequence id: 27% Structural id: 90% Protein Structure Alignment Human Hemoglobin alpha-chain pdb:1jebA Human Myoglobin pdb:2mm1 Another example: G-Proteins: 1c1y:A, 1kk1:A6-200 Sequence id: 18% Structural id: 72%
Transformations • Translation • Translation and Rotation • Rigid Motion (Euclidian Trans.) • Translation, Rotation + Scaling
Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. Assume transformation T is given Question: how to measure an alignment error?
Distance Functions • Two point sets: A={ai} i=1…n • B={bj} j=1…m • Pairwise Correspondence: • (ak1,bt1) (ak2,bt2)… (akN,btN) (1) Exact Matching: ||aki – bti||=0 (2) Bottleneck max ||aki – bti|| (3) RMSD (Root Mean Square Distance) Sqrt( Σ||aki – bti||2/N)
T Correspondence is Unknown Given two configurations of points in the three dimensional space, find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.
Largest Common Point Set (LCP) problem Given e>0 and two point sets A and B find a transformation T and equally sized subsets A’ (a subset of A) and B’ (a subset of B) of maximal cardinality such that dist(A’,T(B’)) ≤ e. Bottleneck metric: optimal solution in O(n32.5) C. Ambuhl et al. 2000 RMSD metric: open problem
A 3-D reference frame can be uniquely defined by the ordered vertices of a non-degenerate triangle p1 p2 p3
Structure Alignment (Straightforward Algorithm) • For each pair of triplets, one from each molecule which define ‘almost’ congruent triangles compute the rigid transformation that superimposes them. • Count the number of aligned point pairs. How? -> maximal bipartite matching (bottleneck metric)
Complexity : O(n3m3 ) * O(nm √(m +n) ) . Can we say something about the quality of the final solution? YES! If there is a LCP of size L with error e, then the alignment method detects a LCP of size >= L with error 8e.M.T. Goodrich et al. 1994.
Superposition - best least squares(RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ S i|pi - qi |2 /n Find a 3-D rigid transformation T* such that: rmsd( T*(P), Q ) = minT√ S i|T(pi) - qi |2 /n A closed form solution exists for this task. It can be computed in O(n) time.
4-helix bundle 2cbl:A 1f4n:A 1rhg:A 1b3q
Sequence Order Independent Alignment 2cbl:A 1f4n 1rhg:A 1b3q 51 103 113 169 chain A chain B 3 58 54 7 34 73 126 171 147 12 chain A chain B 306 355 354 305
The C2 domain calcium-binding motif • E. A. NALEFSKI and J. J. FALKE • The C2 domain calcium-binding motif: Structural and functional diversityProtein Sci 1996 5: 2375-2390
TRAF-Immunoglobulin Ensemble E- strand • Ensemble: 8 proteins from 2 folds. • Core: sandwich of 6 strands • Runtime: 21 seconds - helices ; - strands