200 likes | 230 Views
Sequence id: 27% Structural id: 90%. Protein Structure Alignment. Human Hemoglobin alpha-chain pdb:1jebA. Human Myoglobin pdb:2mm1. Another example: G-Proteins: 1c1y:A, 1kk1:A6-200 Sequence id: 18% Structural id: 72%. Transformations. Translation Translation and Rotation
E N D
Sequence id: 27% Structural id: 90% Protein Structure Alignment Human Hemoglobin alpha-chain pdb:1jebA Human Myoglobin pdb:2mm1 Another example: G-Proteins: 1c1y:A, 1kk1:A6-200 Sequence id: 18% Structural id: 72%
Transformations • Translation • Translation and Rotation • Rigid Motion (Euclidian Trans.) • Translation, Rotation + Scaling
Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. Assume transformation T is given Question: how to measure an alignment error?
Distance Functions • Two point sets: A={ai} i=1…n • B={bj} j=1…m • Pairwise Correspondence: • (ak1,bt1) (ak2,bt2)… (akN,btN) (1) Exact Matching: ||aki – bti||=0 (2) Bottleneck max ||aki – bti|| (3) RMSD (Root Mean Square Distance) Sqrt( Σ||aki – bti||2/N)
T Correspondence is Unknown Given two configurations of points in the three dimensional space, find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.
Largest Common Point Set (LCP) problem Given e>0 and two point sets A and B find a transformation T and equally sized subsets A’ (a subset of A) and B’ (a subset of B) of maximal cardinality such that dist(A’,T(B’)) ≤ e. Bottleneck metric: optimal solution in O(n32.5) C. Ambuhl et al. 2000 RMSD metric: open problem
A 3-D reference frame can be uniquely defined by the ordered vertices of a non-degenerate triangle p1 p2 p3
Structure Alignment (Straightforward Algorithm) • For each pair of triplets, one from each molecule which define ‘almost’ congruent triangles compute the rigid transformation that superimposes them. • Count the number of aligned point pairs. How? -> maximal bipartite matching (bottleneck metric)
Complexity : O(n3m3 ) * O(nm √(m +n) ) . Can we say something about the quality of the final solution? YES! If there is a LCP of size L with error e, then the alignment method detects a LCP of size >= L with error 8e.M.T. Goodrich et al. 1994.
Superposition - best least squares(RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ S i|pi - qi |2 /n Find a 3-D rigid transformation T* such that: rmsd( T*(P), Q ) = minT√ S i|T(pi) - qi |2 /n A closed form solution exists for this task. It can be computed in O(n) time.
4-helix bundle 2cbl:A 1f4n:A 1rhg:A 1b3q
Sequence Order Independent Alignment 2cbl:A 1f4n 1rhg:A 1b3q 51 103 113 169 chain A chain B 3 58 54 7 34 73 126 171 147 12 chain A chain B 306 355 354 305
The C2 domain calcium-binding motif • E. A. NALEFSKI and J. J. FALKE • The C2 domain calcium-binding motif: Structural and functional diversityProtein Sci 1996 5: 2375-2390
TRAF-Immunoglobulin Ensemble E- strand • Ensemble: 8 proteins from 2 folds. • Core: sandwich of 6 strands • Runtime: 21 seconds - helices ; - strands