390 likes | 490 Views
A memory-efficient algorithm for multiple sequence alignment with constraints Chin Lung Lu and Yen Pin Huang National Chiao Tung University Taiwan, Republic of China Bioinformatics, Vol. 21 no. 1 2005. Yutu Liu -- CPSC 689 Algorithmic Techniques for Biology Spring 2005. Motivation.
E N D
A memory-efficient algorithm for multiple sequence alignment with constraintsChin Lung Lu and Yen Pin HuangNational Chiao Tung UniversityTaiwan, Republic of ChinaBioinformatics, Vol. 21 no. 1 2005 Yutu Liu -- CPSC 689 Algorithmic Techniques for Biology Spring 2005
Motivation • Incorporate the biological structures and consensuses into sequence alignment • Memory efficient
No overlapping between them A T C T C G C T A T -- C -- T C G C T -- -- -- A T C T C G C T T G C A T A T -- T G C A T -- -- A T T G C A T A T -- -- -- -- AT T Problem Formulation -- Constraints • What is the multiple sequence alignment with constraints ? Conserved sites of a protein or DNA/RNA family
G A Hamming Distance Approximately appears 0.5 Problem Formulation -- Constraints T G C A T A T
L Band L’ Given S={s1,s2,…,sx}, and Subseq(S2, L’) string T={t1,t2,..tk}, for T G C C C Problem Formulation -- Constraints A T G C A T C G C T -- T G C A T -- -- A T T T G C A T C A T C T approximately appears inL
C2 C3 C1 S1 CPSA S2 S3 Problem Formulation Constrained Multiple Sequence Alignment (CMSA) Optimal Sum-of-Pair Score
CMSA • Pick two sequences • Find the CPSA • Use it as a kernel to progressively align more sequences [1] Progressive Multiple Alignment with Constraints, Gene Myers et al. [2] MuSiC: A Tool for Multiple Sequence Alignment with Constraints Yin Te Tsai Chin Lung LuChing Ta Yu Yen Pin Huang
Divide-and-Conquer Find recursive relationship ai bj ai-1 bj-1 Algorithm Overview M(i,j) M(i-1,j-1)
… … C1 Ck Cγ Notation
A B ... C1 C2 Ck Alignment Score
A ai B ... bj C1 C2 Ck Alignment Score - Substitution
A ai ... B -- C1 C2 Ck Alignment Score -- Deletion
A -- ... B b j C1 C2 Ck Alignment Score -- Insertion
h A B ... C1 C2 Ck-1 Ck Semi-Constrained Alignment
Ck Recurrence of Scores
a i-1 -- a i-1 b j b j --
Constraints ( i-1, j-1, k) ( i-1, j, k) ( i, j, k) ( m, n, γ) ( 0, 0, 0) ( i, j-1, k) Sequence A Sequence B
Assignment Email: alinux@tamu.edu
pref(Ck,h) suff(Ck, λk - h) h … … C1 Ck Cγ
… … C1 Ck Cγ
Discussion • Lack of proof of consistency of constraints • Optimal pair-wise subsequences alignment might cause the failure of the overall optimal alignment
Discussion http://genome.life.nctu.edu.tw:8080/MUSICME/index.html
Reference Efficient Constrained Multiple Sequence Alignment with Performance Guarantee Francis Y.L. Chin N.L. Ho T.W. Lamy Prudence W.H. Wong M.Y. Chan Divide-and-conquer multiple alignment with segment-based constraints Michael Sammeth1,∗, Burkhard Morgenstern2 and Jens Stoye 1 Multiple sequence alignment with the divide-and-conquer method Jens Stoye MuSiC: A Tool for Multiple Sequence Alignment with Constraints Yin Te Tsai1 Chin Lung Lu2∗ Ching Ta Yu1 Yen Pin Huang