1 / 32

Multiple Sequence Alignment

Multiple Sequence Alignment. ClustalW TCoffee Ka, Ks, and Ka/Ks Anchored alignment. ClustalW. http://www.ebi.ac.uk/clustalw/. ClustalW. Paste your sequences. Multiple sequence Alignment alignment options . Submit . Exercise.

thanos
Download Presentation

Multiple Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment ClustalW TCoffee Ka, Ks, and Ka/Ks Anchored alignment

  2. ClustalW • http://www.ebi.ac.uk/clustalw/

  3. ClustalW Paste your sequences Multiple sequence Alignment alignment options Submit

  4. Exercise • HomoloGene is a system for automated detection of homologs among annotated genes of several completely sequenced eukaryotic genomes. • Download the FASTA sequences of HomoloGene:5276 and align them with ClustalW

  5. Download protein sequences

  6. Result Alignment Guide Tree

  7. TCoffee http://tcoffee.crg.cat/ Tcoffee computes its alignments by combining a collection of smaller alignments

  8. Alignment at the DNA level based on an alignment at the Protein Level • The 18-kDa protein plays an important role in fertilization of several abalone species • Build a multiple sequence alignment using the following sequences

  9. Sequences >gi|604533|gb|AAC37231.1| fertilization protein MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNR ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVA MKVADLPCN >gi|604531|gb|AAC37233.1| fertilization protein MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFE TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSE ITKPMQKLLDTKDGELPCPVRKIHG >gi|604529|gb|AAC37232.1| fertilization protein MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNR ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRK YSSKDPGTFPCKNEKRRG >gi|604527|gb|AAC37230.1| fertilization protein MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNR QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVA MKVADLPCN >gi|604525|gb|AAC37229.1| fertilization protein MRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNR ERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVA MKVADLPCN

  10. Choose TCoffee Regular, paste the sequences in the data box, and press submit

  11. Download formats Guide tree

  12. Codon Alignment • In order to study selection patterns, you will need to have the corresponding DNA alignment • Using the PROTOGENE (Protein-to-Gene) in Tcoffee, the amino-acid alignment will be transformed into a codon alignment. The actual procedure invloves tBLASTn.

  13. PROTOGENE (in Tcoffee) is time consuming. Please submit your email address, and the results will be emailed to you. • PROTOGENE may return more that one DNA sequence for any given Protein sequence. For your homework assignment, please choose one sequence for each species.

  14. (Result) Codon alignment >gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotisassimilis fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAG GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAG GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAG AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGA GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGC CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGC CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotiscorrugata fertilization protein mRNA, complete cds ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGA AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAG ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTT GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAG AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAA TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAAC TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGA TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGC CCTGTTAGAAAGATACATGGATAA >gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotisfulgens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT------ ------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAG AGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGA GATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAG AGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAA CATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAAC TATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGG CCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGC AAGAACGAGAAACGCCGCGGATGA >gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotissorenseni fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG ATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAG AATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAG AGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGA GGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGC TATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotisrufescens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG GTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAG AAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAG AGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGA GGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGC TATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG

  15. SNAP - Ds/Dn Calculation Tool http://hcv.lanl.gov/content/sequence/SNAP/SNAP.html Calculates synonymous and nonsynonymous substitution rates based on codon alignments according to Nei and Gojobori (1986) method.

  16. Input codon alignment Select output statistics

  17. SNAP - Ds/Dn Calculation Tool Conclusion: We detect positive selection in six of the comparisons. So did Swanson and Vacquier (1998).

  18. Distmat http://emboss.bioinformatics.nl/cgi-bin/emboss/distmat Distmat calculates the evolutionary distances between every pair of sequences in a multiple alignment. The distances are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids

  19. Distmat • Feed the DNA alignment of 18-kDa protein into distmat. • Calculate separately the distances between the sequences for codon positions 1 and 2, and for codon position 3. • Are the results in agreement with those from the dn/ds analysis?

  20. Distmat

  21. Distmat

  22. Anchored multiple-sequence alignment with DIALIGN http://dialign.gobics.de/anchor/submission.php User manual: http://dialign.gobics.de/anchor/manual

  23. Align the following sequences (use the file dalign_sequences.txt): >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK

  24. Results • DIALIGN makes alignments from fragments

  25. Results • Numbers below the alignment reflect some rough degree of local similarity among the sequences

  26. Anchored alignment • Now, let us assume that the user has some expert knowledge concerning a certain domain that is present in all the input sequences • The domains marked in red in the three sequences are thought to be homologous to one another >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK

  27. Therefore, the user wants to define this domain as anchor and align the rest of the sequences automatically. • To specify a set of anchor points, each anchor point corresponds to a equal-length segment pair involving two of the input sequences should be defined

  28. first sequence involved • second sequence involved • start of anchor in first sequence • start of anchor in second sequence • length of anchor

  29. Results • The specified domain is aligned and the remainder of the sequences is aligned automatically respecting the constraints given by the anchor points:

  30. Guidance/HoT

  31. >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK >seq4 WRMDSNQKNPNNPKAAYNKGDANAPK

More Related