130 likes | 210 Views
A New Approach for Alignment of Multiple Proteins. Adam Hebdon Zhang, Xu, Kahveci, Tamer, 2006, “A New Approach for Alignment of Multiple Proteins”, Pacific Symposium on Biocomputing, 11:339-350. Why Do We Need A New Approach?.
E N D
A New Approach for Alignment of Multiple Proteins Adam Hebdon Zhang, Xu, Kahveci, Tamer, 2006, “A New Approach for Alignment of Multiple Proteins”, Pacific Symposium on Biocomputing, 11:339-350.
Why Do We Need A New Approach? • Multiple Sequence Alignment one of most fundamental problems of Bioinformatics • Used to make predictions, relation of protein sequences, etc. • More difficult when sequences are dissimilar • Traditional alignment methods add sequences one by one. • Progressive - Anchor-based • Iterative - Probabilistic • Number of sequences significantly affects quality of resulting alignment.
Why Do We Need A New Approach? • Progressive • Never more than two sequences simultaneously aligned • Allows alignments of any size • Iterative • Start with initial alignment • Repeatedly refine alignment through series of iterations until no improvements made
Why Do We Need A New Approach? • Anchor-Based • Use local motifs as anchors for alignment • Unaligned segments aligned using other methods • MAFFT, Align-m, L-align, Mavid, PRRP, etc. • Probabilistic • Analyze known multiple alignments & pre-compute substitution probabilities • Maximize alignment for given sequences
Horizontal Sequence Alignment (HSA) • All proteins are considered at once and aligned simultaneously • Particularly accurate in “twilight zone” where other methods are not. • Twilight zone = % of Identities below 25% • HAS performs similar to traditional methods of high Identity percentage
Horizontal Sequence Alignment (HSA) • Construct Initial Directed Graph from AA Sequence, Secondary Structure, etc. • Group Vertices Based on Residue Type • Insert Gap Vertices to Align Similar Vertices Topologically • Determine Alignment from High Scoring Cliques with Sliding Window • Adjust Gap Vertices of Initial Alignment
Structure of Graph Sequences & Associated Secondary Structure Vertices Representing Secondary Structure & Gaps Horizontal Graph of Sequence & Color of Different Protein
HSA Step 1: • Directed edge corresponds to consecutive AA in each protein • Undirected edge between vertices whose substitution score is sufficient
HSA Step 2: • Group Fragments Most Likely To Be Aligned Together • totalScore = typeScore – positionPenalty – lengthPenalty
HSA Step 3: • Update Graph with Gaps so like vertices are close topologically
HSA Step 4: • Place Window over all sequences where w = number of vertices to cover • Ex: w = 3, window covers first 3 vertices of each sequence • For Each Clique, align letters of the clique and find the next best clique • Clique = complete sub-graph consisting of one vertex of each color (1 column of multiple alignment) • Slide window down & find next clique that: • Doesn’t conflict with previous clique - 1 letter next to letter in previous clique
HSA Step 5: • Move gaps found inside fragment of alpha-helix or beta-sheet outside the fragment • Final alignment obtained from mapping each vertex in final graph back to original residue
Observations of HSA • Identity = 0 – 20% • HSA method outperforms traditional alignment methods significantly • Identity = 20 – 40% • HSA method is comparable to traditional alignment methods • Identity = 40- 100 % • Little improvement can be made from traditional alignment • HSA performs slightly under traditional methods