210 likes | 356 Views
Multiple Alignment and Phylogenetic Trees. Csc 487/687 Computing for Bioinformatics. Multiple Sequence Alignment. One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. Very informative. Definition.
E N D
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics
Multiple Sequence Alignment • One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. • Very informative
Definition • A global alignment of a set of sequences is obtained by • inserting into each sequence gap characters • so that • the resulting sequences are of the same length • and so that • no “column” has only gap characters
Use of alignments • High sequence similarity usually means significant structural and/or functional similarity. The reverse does not need to be true • Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site. • Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two. • Multiple alignment can be used to derive evolutionary history.
Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important
Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important • patterns of hydrophobicity/hydrophilicity • secondary structure elements
Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important • patterns of hydrophobicity/hydrophilicity • secondary structure elements • “gappy” regions • loops/variable regions
Loop? Loop? Loop?
Use of Alignments- make patterns/profiles • Can make a profile or a pattern that can be used to match against a sequence database and identify new family members • Profiles/patterns can be used to predict family membership of new sequences • Databases of profiles/patterns • PROSITE • PFAM • PRINTS • ...
Protein sequence Prosite pattern 1 Prosite pattern 2 Prosite pattern n Family 1 Family 2 Family n Prosite: Motifs for classification Regular expression Pattern Profile
Pattern from alignment [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]
Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules
For homologous proteins • Two residues should be aligned (on top of each other) • if they are homologous (evolved from the same residue in a common ancestor protein) • if they are structurally equivalent
Automatic approach • Need a way of scoring alignments • fitness function which for an alignment quantifies its “goodness” • Need an algorithm for finding alignments with good scores • Not all methods provide a scoring function for the final alignment!
Analysis of fitness function • One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences • For example, if the structure of (some of) the proteins are known.
Align by use of dynamic programming • Dynamic programming finds best alignment of k sequences with given scoring scheme • For two sequences there are three different column types • For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x • Time complexity of O(nk) (sequence lengths = n)
Use of dynamic programming • Dynamic programming finds best alignment of k sequences given scoring scheme