220 likes | 434 Views
Pairwise alignment. Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial explosion than with pairwise alignment…. Multi-dimensional dynamic programming (Murata et al. 1985).
E N D
Pairwise alignment • Now we know how to do it: • How do we get a multiple alignment (three or more sequences)? • Multiple alignment: much greater combinatorial explosion than with pairwise alignment…..
Simultaneous Multiple alignmentMulti-dimensional dynamic programming MSA (Lipman et al., 1989, PNAS86, 4412) • extremely slow and memory intensive • up to 8-9 sequences of ~250 residues DCA (Stoye et al., 1997, CABIOS13, 625) • still very slow
Alternative multiple alignment methods • Biopat (first method ever) • MULTAL (Taylor 1987) • DIALIGN (Morgenstern 1996) • PRRP (Gotoh 1996) • Clustal (Thompson Higgins Gibson 1994) • Praline (Heringa 1999) • T Coffee (Notredame 2000) • HMMER (Eddy 1998) [Hidden Marcov Models] • SAGA (Notredame 1996) [Genetic algorithms]
Progressive multiple alignment general principles 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Scores Similarity matrix 5×5 Scores to distances Iteration possibilities Guide tree Multiple alignment
General progressive multiple alignment technique(follow generated tree) d 1 3 1 3 2 5 1 3 2 5 1 root 3 2 5 4
Progressive multiple alignment Problem: Accuracy is very important Errors are propagated into the progressive steps “Once a gap, always a gap” Feng & Doolittle, 1987
Multiple alignment profilesGribskov et al. 1987 i A C D W Y 0.3 0.1 0 0.3 0.3 Gap penalties 1.0 0.5 Position dependent gap penalties
Profile-sequence alignment sequence profile ACD……VWY
Profile-profile alignment profile A C D . . Y profile ACD……VWY
Clustal, ClustalW, ClustalX • CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct guide tree. • Sequence blocks are represented by profiles, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree. • Further carefully crafted heuristics include: • (i) local gap penalties • (ii) automatic selection of the amino acid substitution matrix, (iii) automatic gap penalty adjustment • (iv) mechanism to delay alignment of sequences that appear to be distant at the time they are considered. • CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)
Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors
Pre-profile generation 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Cut-off Pre-profiles Pre-alignments 1 A C D . . Y 1 2 3 4 5 2 2 A C D . . Y 1 3 4 5 5 A C D . . Y 1 5 2 3 4
Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors
Protein structure hierarchical levels SECONDARY STRUCTURE (helices, strands) PRIMARY STRUCTURE (amino acid sequence) VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)
Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors
Globalised local alignment 1.Local (SW) alignment (M + Po,e) + = 2.Global (NW) alignment (no M or Po,e) Double dynamic programming
Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors
Matrix extension – T COFFEE 2 1 3 1 4 1 3 2 4 2 4 3
Summary • Weighting schemes simulating simultaneous multiple alignment • Profile pre-processing (global/local) • Matrix extension (well balanced scheme) • Smoothing alignment signals • globalised local alignment • Using additional information • secondary structure driven alignment • Schemes strike balance between speed and sensitivity
References • Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem.23, 341-364. • Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217. • Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477.