220 likes | 408 Views
A New Algorithm for Protein Folding in the HP Model. Alantha Newman Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 876-884, 2002 Created by: Chia-Chang Wang Date: Feb. 25, 2005. Abstract.
E N D
A New Algorithm for Protein Folding in the HP Model Alantha Newman Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 876-884, 2002 Created by: Chia-Chang Wang Date: Feb. 25, 2005
Abstract • We consider the problem of protein folding in the HP model on the two-dimensional square lattice. This problem is combinatorially equivalent to fold a string of 0's and 1's so that the string forms a self-avoiding walk on the lattice and the number of adjacent pairs of 1's is maximized. We present a linear-time 1/3-approximation algorithm for this problem, improving on the previous best approximation factor of 1/4. The approximation guarantee of this algorithm is based on an upper bound used in all previous papers that address this problem.
Some Notations 1) S = [s1,s2,s3…sn]. • 2) Odd-1: 1 in an odd index.Even-1: 1 in an even index. • 3) For every substring of S, s=[sj,sj+1…sk], O[s]: number of odd-1’s in s.E[s]: number of even-1’s in s.
Assumptions • The length of S is even. If not we can add an extra 0 at the beginning or end of the string. • 2) The number of odd-1’s == even-1’s. if one appears more than the other, we can arbitrarily change 1’s to 0’s (or 0’s to 1’s).
e e e e o o o o e e e o o o o e e e e e o o o o e e e o o o o e e e e o o o o 1 e e e o o o o e e e e e o o o o e e e o o o o e Some Lemmas and Proofs On the square lattice, an even-1 can only be adjacent to odd-1 and vice versa.
Some Lemmas and Proofs(Cont.) Lemma 2.1: If L will be the closed loop of S (made by joining the endpoints), L will have at least the same number of contacts as S. • Proof: any folding of L broken in the any point will become a folding of a string.
Some Lemmas and Proofs(Cont.) Lemma 2.2: If any loop L has E[L]==O[L] (equal number of odd-1’s and even-1’s), there is an element si, such that if we go around L to one direction, O[si, si+1, …] E[si, si+1, …].
Some Lemmas and Proofs(Cont.) Proof 2.2: Let S=s1, s2 … sn. by making a new function f(i)=O[s1…si]-E[s1…si].Finding the point p, in which f(i) is at it’s minimum is done in linear time complexity. • By considering p as a starting point of loop L,we will get new function f’(i) that is non-negative. ( reasoning: point p is the point in which the difference between odd-1’s and even-1’s is the greatest, by choosing it as a starting point we zero the function f(p) }
p=6 Some Lemmas and Proofs(Cont.) Proof 2.2: for example the string110101101011. • (E[L]=O[L] => f(0)=f(n)=0) • p will always be even: • assume p is odd. If sp+1 =1, f(p+1)<f(p). contradiction to assumption. • If sp+1 =0, f(p+1)=f(p). p+1 is even. pick it as p.
Sp Some Lemmas and Proofs(Cont.) By making the new string s’=s’1, s’2…s’n, such that s’1 = sp+1… (cyclic rotation) (That’s why we need p to be even – not to switch odd-1’s and even-1’s) s’=101011110101 • By going from left to right By going from right to left • f(i)>=0 f(i)<=0
P Bo(3) Bo(2) Be(1) S = 11010110100011 Additional Notations • Bo(i) : substring following the i-1 odd-1 to and including i odd-1. • Be(i) : substring following the i-1 even-1 to and including i even-1.
p p-1 p+1 p-2 The Algorithm • Starting point: There are 4 cases: (a) |Be(i)|=2 & |Bo(j)|=2i<=i+2, j<=j+2 (there are 3 contacts)
The Algorithm(Cont.) (b) |Be(i) |>2 & |Bo(j)|>2 i<=i+2, j<=j+2 (there are 3 contacts)
The Algorithm (Cont.) (c) |Be(i)|>2 & |Bo(j)|=2 i<=i+1, j<=j+2 (there are 2 contacts)
The Algorithm (Cont.) (d) |Be(i)|=2 & |Bo(j)|>2 i<=i+2, j<=j+1 (there are 2 contacts)
The Algorithm (Cont.) Contacts Count: (a, b, c-d) Case (a,b): 3 contacts for every 2 odd-1’s. Case (c-d): 4 contacts for every 3 odd-1’s. Unpaired (c) cases: 1 contact for every odd-1. • * Cases (a, b, c-d): at least 4 contacts for every 3 odd-1’s.
Analysis Theorem: The algorithm finds at least M/3 contacts, i.e a 1/3 approximation. (M = min{ O[S], E[S] } = O[S] = E[S]) • The ‘K’ Assumption: • Assume k more cases (c) folds than case (d) folds. • If not we will count even-1’s contacts. • (* therefore: E[p-2, p-3, … i*] = O[p+1, p+2, … j*] - k ) • O[p+1, p+2, … j*] = number of odd-1’s in contacts,O[p-2, p-3, … i*] = number of odd-1’s not necessarily in contacts
O[S] = O[p+1, p+2, … j*] + O[p-2, p-3, … i*] Analysis (Cont.) • By Lemma 2.2: (cyclic move) O[p-2, p-3, … i*] <= E[p-2, p-3, … i*]
O[S] <= O[p+1, p+2, … j*] + E[p-2, p-3, … i*] Analysis (Cont.) • By Lemma 2.2: (cyclic move) O[p-2, p-3, … i*] <= E[p-2, p-3, … i*] • By The ‘K’ Assumption: (previous page) • E[p-2, p-3, … i*] = O[p+1, p+2, … j*] - k
O[S] <= 2 O[p+1, p+2, … j*] - k Analysis (Cont.) • By Lemma 2.2: (cyclic move) O[p-2, p-3, … i*] <= E[p-2, p-3, … i*] • By The ‘K’ Assumption: (previous page) • E[p-2, p-3, … i*] = O[p+1, p+2, … j*] - k
O[S] k 2 2 • O[p+1, p+2, … j*] >= + • 2O[S] 3 • Contacts = (O[p+1, p+2, … j*] - 2k) + 2k • 43 • 43 • M 3 • O[S] 3k 2 2 • >= ( - ) + 2k = = Analysis (Cont.) • By Lemma 2.2: (cyclic move) O[p-2, p-3, … i*] <= E[p-2, p-3, … i*] • By The ‘K’ Assumption: (previous page) • E[p-2, p-3, … i*] = O[p+1, p+2, … j*] - k • Total Contacts:
Total Algorithm Time Complexity is: O(n). Analysis (Cont.) • Time Complexity: • Point p can be found at O(n), • Going over the string in both directions, and folding it is proportional to the total length of the string, O(n).