160 likes | 319 Views
Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters. Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04. Multiple Sequence Alignment. Input: k sequences on alphabet { a, g, c, t } Output: An alignment A aligns these sequences (allowing gap)
E N D
Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04 CSIE NCNU
Multiple Sequence Alignment • Input: k sequences on alphabet {a, g, c, t} • Output: An alignment Aaligns these sequences (allowing gap) attgcc, ttacgg, aatgga, tatcgt, cgatag CSIE NCNU
Progressive Methods • Multiple Sequence Alignment is NP-hard. (Wang and Jiang 1994, sum of pair) • 2-Approximation by Gulsfield (1991) • Input: k sequences • Output: An alignment of k sequences with performance ratio smaller than 2 • Idea: Do several times of pairwise alignment to combine a multiple sequence alignment. CSIE NCNU
Remarks • In progressive methods, we always consider sequences, and we always use adding gaps to achieve multiple sequence alignment. • In Gulsfield’s 2-approximation, it doesn’t handle sequences containing clusters well. • Can we align more than 2 sequences at once with a short period of time? CSIE NCNU
Data Structure of Block Alignment • We use a matrix to present a sequence or an alignment. • Given . • We can use to present the alignment. CSIE NCNU
Aligning Matrices • From now on, what we consider is a set of matrices which represent sequences or alignments. • We use the idea the same with pairwise alignment to align two matrices. • We define that and are two matrices which present sequences or alignments to be aligned. CSIE NCNU
Scoring Columns • In pairwise alignment, what we align is two characters. And in block alignment, what we align will be column vectors. • Let there be two column vectors P and Q, where and . CSIE NCNU
Aligning Columns CSIE NCNU
Recurrence Formula CSIE NCNU
The Algorithm Based of Block Alignment Input: k sequences Output: an alignment • Step1: Initialize every sequence as a block. • Step 2: Merge the two nearest blocks. • Step 3: Repeat Step 2 until there is only one block. CSIE NCNU
Given S1=atttaagggc, S2=aattaagggc, S3=atttacgggc, S4=cccttaacg, S5=cccataacg • The following is the corresponding graph. 9 2 2 2 CSIE NCNU
Experimental Results • We generate ten sets of data, and each set has ten sequences which have two clusters and their lengths are all about 500. CSIE NCNU
Experimental Results • We generate four sets of data, and each set has nine sequences which has three clusters. CSIE NCNU
Experimental Results • We took ten DNA sequences of 5 hepatitis B viruses and 5 hepatitis C viruses to test with block alignment and 2-approximation. We also took seven sequences of 3 dogs and 4 wolves to test. CSIE NCNU
Discussions and Future Works • We may use other score function to evaluate. • We also can try other strategy to merge blocks. • We can expand our program to align protein sequence, and then applying PAM matrix to replace our score function. CSIE NCNU
Thank you CSIE NCNU