820 likes | 1.2k Views
Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November 2007. Tools for multiple sequence alignment. Multiple alignment basis of (almost) all methods for sequence analysis in bioinformatics.
E N D
Methods courseMultiple sequence alignment andReconstruction of phylogenetic treesBurkhard Morgenstern, Fabian SchreiberGöttingen, October/November 2007
Tools for multiple sequence alignment Multiple alignment basis of (almost) all methods for sequence analysis in bioinformatics
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E Y I M Q E V Q Q E Y I A M R E Q Y E
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E Y - I - M Q E V Q Q E Y – I A M R E - Q Y E
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Which one is the best ???
Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm
Tools for multiple sequence alignment • What is a biologically good alignment ??
Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein! • Evolution: align residues with common ancestors!
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M - R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!
Tools for multiple sequence alignment Compute for amino acids a and b • Probability pa,b of substitution a → b (or b → a), • Frequency qaof a Define similarity score s(a,b) based on pa,b , qa Result: similarity matrix (substitution matrix), e.g. PAM (Dayhoff matrix), BLOSUM, …
Tools for multiple sequence alignment Traditional objective functions: Define Score of alignments as • Sum of individual similarity scores s(a,b) of aligned amino acid residues • Gap penalty g for each gap in alignment Optimal alignment can be calculated for two sequences but in practice not for > 8 sequences
T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g
Tools for multiple sequence alignment Most commonly used heuristic for multiple alignment: Progressive alignment (mid 1980s): Idea: • calculate multiple alignment as series of pairwise alignments of sequences and profiles • Use guide tree to determine order of pairwise alignments
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
CLUSTAL W Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994, Nuc. Acids Res.) (22,327 citations in the literaterature!, Oct 2007)
Tools for multiple sequence alignment Problems with traditional approach: • Results depend on gap penalty • Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction • Algorithm produces global alignments.
Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif
Local sequence alignment EYENS ERYENS ERYAS Find common motif in sequences; ignore the rest
Local sequence alignment E-YENS ERYENS ERYA-S Find common motif in sequences; ignore the rest
Local sequence alignment E-YENS ERYENS ERYA-S Find common motif in sequences; ignore the rest – Local alignment
Gibbs Motive Sampler Local multiple alignment without gaps: E.g. Gibbs sampling C.E. Lawrence et al. (1993, Science)
Traditional alignment approaches: Either global or local methods!
New question: sequence families with multiple local similarities Neither local nor global methods appliccable
New question: sequence families with multiple local similarities Alignment possible if order conserved
The DIALIGN approach Morgenstern, Dress, Werner (1996, Proc Natl. Acad. Sci.) • Combination of global and local methods • Assemble multiple alignment from gap-free local pairwise alignments (,,fragments“)
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!
The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa
The DIALIGN approach Advantages of segment-based approach: • Program can produce global and local alignments! • Sequence families alignable that cannot be aligned with standard methods