200 likes | 280 Views
Class 5: Multiple Sequence Alignment. Multiple sequence alignment. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG---
E N D
Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns • Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3d structural positions
Multiple alignment – why? • Identify sequence that belongs to a family • Family – a collection of homologous, with similar sequence, 3d structure, function or evolutionary history • Find features that are conserved in the whole family • Highly conserved regions, core structural elements
The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]
Scoring a multiple alignment (1) Important features of multiple alignment: • Some positions are more conserved than others Position specific scoring • Sequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution
Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.
Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure
Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:
Multidimensional DP Complexity Space: Time:
MSA (i) [Carrillo and Lipman, 1988]
MSA (iii) Algorithm sketch
Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: • PW alignment order • One growing alignment or subfamilies • Alignment and scoring procedure
Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: • Leaves – sequence • Interior nodes – alignments • Root – complete multiple alignment
Feng-Doolittle (1987) • Calculate all pairwise distances using alignment scores: • Construct a guide tree using hierarchical clustering • Highest scoring pairwise alignment determines sequence to group alignment
Profile alignment • Use profiles for group to sequence and group to group alignments • CLUSTALW (Thompson et al., 1994): • Similar to Feng-Doolittle, but uses profile alignment methods • Numerous heuristics
Iterative Refinement • Addresses “frozen” sub-alignment problem • Iteratively realign sequences or groups to a profile of the rest • Barton and Sternberg (1987) • Align two most similar sequences • Align current profile to most similar sequence • Remove each sequence and align it to profile