200 likes | 404 Views
Regular Expression Constrained Sequence Alignment. Abdullah N. Arslan Assistant Professor Computer Science Department. Outline. Sequence alignment Common frame-work DP solution Why constrained ? RE constrained sequence alignment Algorithm Concluding Remarks. Alignment Matrix.
E N D
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department
Outline • Sequence alignment • Common frame-work • DP solution • Why constrained ? • RE constrained sequence alignment • Algorithm • Concluding Remarks
Dynamic Programming Solution Hi,j: maximum score achieved at (i, j) where Hi,j = 0 whenever i=0 or j=0, Hn,m inO(nm) time, O(m) space
DP Solution: Local Alignment Hi,j: similarity score achieved at (i, j) where Si,j = 0 whenever i=0 or j=0, max Hi,j inO(nm) time, O(m) space
Dynamic Programming Formulation Affine gap penalties Penalty for a gap of length k is a+(k-1)b where Si,j = Fi,j = Ei,j = 0 when i=0 or j=0 max Hi,j O(nm) time, O(m) space
The Definition of the Constrained LCS Problem • The contrained LCS (CLCS) problem • Given strings S1,S2, and P • Find lcs of S1 and S2 s.t. P is a subsequence of this lcs • Motivation: • Computing the homology of two biological sequences that have a specific part in common
Constrained Sequence Alignment Problems • Constrained LCS • Tsai 2003, O(n2m2r) time • Chin et. al 2004, Arslan and Egecioglu 2004 • O(nmr) time • Edit-distance constrained sequence alignment • Arslan and Egecioglu 2004, O(dnmr) • Regular-expression constrained sequence alignment • Motivation: • Comet and Henry, 2002 • PROSITE patterns • This paper
PROSITE patterns as constraints • PROSITE patterns are • Regular expressions with no Kleene closure • PROSITE database • e.g. [GA]-X(4)-G-K-[ST] • ATP/GTP-binding site motif A (P-loop) (PS00017) • Comet and Henry reward alignments • Regular expression constrained sequence alignment • Find a maximal alignment that includes a given RE
Some Details of Automata Construction • Equivalent NFA N to a given RE R • Construct from N a new NxN automaton • Moves on edit operations • (or equivalently on alignment columns) • States have weights • Interested in the weights of the final states after the alignment is complete
Weighted Automaton • Initial weights are • Weight of (q0,q0) is initially 0 • Update new maximum scores at reachable states • Weights become in unreachable states • What are the maximum weights at the final states?
Complexity • Simulate automata based on DP solution • Each steps requires examining the trasition functions • Maintain a list of active (reachable) states • Update state weights as alignments are formed • Automaton Mi,j has the optimum weights
CONCLUSION • Introduced the regular expression constrained sequence alignment problem • Present an algorithm for the problem • Future work • Generalization of the problem for • Multiple sequence alignment • Multiple regular expressions as a constraint