140 likes | 219 Views
Biological Motivation for Multiple Sequence Alignment. Rhys Price Jones Anne R. Haake. Multiple Alignment. What is Multiple Alignment? An Example:
E N D
Biological Motivation for Multiple Sequence Alignment Rhys Price Jones Anne R. Haake
Multiple Alignment • What is Multiple Alignment? An Example: VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
Why do multiple alignment? • Look for functional homology • consensus sequences • DNA level • transcription factor binding sites ; PCR priming sites • Protein level: conserved sequence regions • that correspond to active sites • that are helpful in predicting 3-D structure • that are helpful in identification of protein family members • overall, useful in designing experiments to test and modify the function of specific proteins
Why do Multiple Alignment? • Pairwise comparisons may miss important functional resemblance • Consider the case where sequences are not very similar • Simultaneous comparisons can be more powerful
Example • VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
Example • VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
Example • VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
Example • VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--
Why do Multiple Alignment? • Look for evolutionary history of sequences • Can infer some relatedness • Need to look at longer protein fragments to make phylogenetic observations that are statistically significant • A later topic…
Why do Multiple Alignment? • What is another use of multiple alignment that we discussed when learning about scoring pairwise alignment? • Creation of substitution matrices! • e.g. BLOSUM
Multiple Alignment: A Challenging Problem Why? • Methods used in pairwise alignment are not practical: consume time and memory • Consider: • alignment of 2 sequences • using Needleman-Wunsch algorithm : a 2-dimensional matrix • Problem grows by N2 where N= length of each sequence • Now, consider: • Alignment of multiple sequences • Matrix is multi-dimensional and computation grows as Nm where m is the number of sequences
An example • Align a 100 nucleotide sequence from 5 species • Nm = 1005 = 10,000,000,000 operations
Not realistic to do a global alignment with an algorithm like NW for more than 3 sequences • heuristics adopted in order to improve efficiency • assumptions must be made • introduce potential for misalignment
Challenges of Multiple Alignment Need a thorough understanding of the biological problem to: • decide which assumptions should be made • evaluate outcome of the alignment