1 / 17

Outline

Outline How to build a phylogenetic tree: Sequence alignment Tree reconstruction How to read a tree ClustalX software clustalx.exe njplotWIN95.exe Phylogenetic marker genes 16S rRNA Sequence format FastA Demonstration BLAST search Alignment & tree reconstruction

paul
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • How to build a phylogenetic tree: • Sequence alignment • Tree reconstruction • How to read a tree • ClustalX software • clustalx.exe • njplotWIN95.exe • Phylogenetic marker genes • 16S rRNA • Sequence format • FastA • Demonstration • BLAST search • Alignment & tree reconstruction • Identification of unknown sequence Falk Warnecke, Microbial Ecology Program, JGI, FWarnecke@lbl.gov

  2. Darwin and Haeckel Darwin, 1837 Haeckel, 1866

  3. Darwin and Haeckel Darwin, 1837

  4. Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions

  5. Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions Aligned: *** *** * **** Sequence A: GTAACGTGA-TACG Sequence B: GTA-CGTCAATACG

  6. Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions Exchanged base / substitution Aligned: *** *** * **** Sequence A: GTAACGTGA-TACG Sequence B: GTA-CGTCAATACG Insertion / deletion

  7. A Distance matrix tree B C D E Genetic divergence 0.5 0.4 0.3 0.2 0.1 0.0 Phylogenetic tree reconstruction **** Sequence A: AAGGTTCCAC Sequence B: AAAATTCCAC Sequence C: AACCCCCCAC Sequence D: GGTTAACCAC Sequence E: GGTTGGCCAC A B C D A B 0.2 C 0.4 0.4 D 0.6 0.6 0.6 E 0.6 0.6 0.6 0.2

  8. Free software for sequence alignment and tree calculation Download from: ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX (clustalx1.83.zip) ClustalX

  9. Phylogenetic marker genes - prerequisites • Presence in all organisms Examples: • 16S rRNA • 23S rRNA • Elongation factors • EF-Tu • EF-G • ATP synthase • RecA • Hsp60 • RNA polymerase • Gyrase • Functional constancy • Complexity • Size (information content) • Conserved and variable • structure elements • Comprehensive database

  10. Ribosomal RNA as a phylogenetic marker gene • Advantages: • Ubiquitous distribution • Functional constancy • Large size (information content) • Conserved and highly variable structural elements • Comprehensive databases available • (No lateral gene transfer) • Good target for FISH! • Disadvantages: • No continuous sequence change • Multiple genes/operons • Different species with identical 16S rRNAs • One base change needs nearly one million years

  11. 21 proteins 16S rRNA 30S 70S Ribosome subunits 50S 5S rRNA Escherichia coli 16S rRNA Primary and Secondary Structure 34 proteins 23S rRNA 16S Ribosomal RNA as a phylogenetic marker gene

  12. Sequence format: Fasta >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT GCGAGTCAAGGTATCGTAAGATGCCGGCACACTGCTCAGTAACACGTGGA TAATCTAACCTTGAGTAAGGGATAACTTCGGGAAACTGAAGGTAATACCT TATAATTGCTTAAAACTGGAATGTTTTTGCAATAAAAGTTACGACGCTCA AGGATGAGTCTGCGACCTATCAGGTAGTAGGTGGTGTAATGGACCACCTA GCCTCAGACGGGTACGGGCCCTGGGAGGGGTAGCCCGGAGATGGACTCTG AGACATAAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAACACTGTGCAA TGCGCGAAAGCGCGACACGGGGAGCTTGAGTGTCTTGGCATAGCCAAGAC TTTTCTCATTCCTAAAAAGCATGAGGAATAAGTGCTGGGTAAGACGGGTG CCAGCCGCCGCGGTAACACCCGCAGCACGAGTAGTGGTCACTTTTATTGA GCCTAAAGCGTTCGTAGCCGGTTTTGTAAATCTTCAGATAAAGCCTGAAG CTTAACTCCAGAAAGTCTGAAGAGACTGCAAGACTTGAGATCGGGTGAGG TTAAACGTACTTTCAGGGTAGGGGTAAAATCCTGTAATCCCGGAAGGACG ACCAGTGGCGAAAGCGTTTAACTAGAACGAATCTGACGGTAAGGAACGAA GGCTAGGGTAGCAAACCGGATTAGATACCCGGGTAGTCCTAGCTGTAAAC ATTGCCCATTTGATGTTGCTTTTCCGTTGAGGGAAGGCAGTGTCGGAGCG AAGGTGTTAAATGGGCCGCTTGGGAAGTATGGTCGCAAGACTGAAACTTA AAGGAATTGGCGGGGGAGCACCGCAACGGGAGGAATGTGCGGTTTAATTG GATTCAACGCCGGAAAACTCACCGGGAACGACCTGTGCATGAGAGTCAAC CTGACGAGCTTACTCGATAGCAGGAGAGGTGGTGCATGGCCGTCGTCAGC TCGTACCGTAGGGCGTTCACTTAAGTGTGATAACGAGCGAGACCCACATC TTTAATTGCAAATGTATATGAGAATATGCATGCACTTTAGAGAAACCGCC AGCGCTAAGCTGGAGGAAGGAGTGGTCGACGGCAGGTCAGTACGCCCCGA ATTTCCCGGGCTACACGCGCATTACAAAGAACGGGACAATACGTTGCAAC CTCGAAAGAGGAAGCTAATCGCGAAACCCGTCCATAGTTAGGATTGAGGG CTGTAACTCGCCCTCATGAATCTGGATTCCGTAGTAATCGCGGGTCAACA ACCCGCGGTGAACATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCA TCCGAGTTGGTGTTGGATGAGGTTTAATTCGAGAGGGTTAAATCAAATCT GATGTCGGTGAGGAGGGTTAAGTCGTAACAAGGTATCCGTA 16S rRNA sequence 1441 nt

  13. Demonstration: Identify unknown organism • Starting point: 16S rRNA sequence of an unknown organism • Strategy: • Retrieve closely related reference sequences from Genbank via BLAST • Compile sequences in Fasta format in one text file • Do sequence alignment and tree reconstruction using ClustalX • Identify organism

  14. Demonstration: 16S rRNA sequence of unknown organism >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT GCGAGTCAAGGTATCGTAAGATGCCGGCACACTGCTCAGTAACACGTGGA TAATCTAACCTTGAGTAAGGGATAACTTCGGGAAACTGAAGGTAATACCT TATAATTGCTTAAAACTGGAATGTTTTTGCAATAAAAGTTACGACGCTCA AGGATGAGTCTGCGACCTATCAGGTAGTAGGTGGTGTAATGGACCACCTA GCCTCAGACGGGTACGGGCCCTGGGAGGGGTAGCCCGGAGATGGACTCTG AGACATAAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAACACTGTGCAA TGCGCGAAAGCGCGACACGGGGAGCTTGAGTGTCTTGGCATAGCCAAGAC TTTTCTCATTCCTAAAAAGCATGAGGAATAAGTGCTGGGTAAGACGGGTG CCAGCCGCCGCGGTAACACCCGCAGCACGAGTAGTGGTCACTTTTATTGA GCCTAAAGCGTTCGTAGCCGGTTTTGTAAATCTTCAGATAAAGCCTGAAG CTTAACTCCAGAAAGTCTGAAGAGACTGCAAGACTTGAGATCGGGTGAGG TTAAACGTACTTTCAGGGTAGGGGTAAAATCCTGTAATCCCGGAAGGACG ACCAGTGGCGAAAGCGTTTAACTAGAACGAATCTGACGGTAAGGAACGAA GGCTAGGGTAGCAAACCGGATTAGATACCCGGGTAGTCCTAGCTGTAAAC ATTGCCCATTTGATGTTGCTTTTCCGTTGAGGGAAGGCAGTGTCGGAGCG AAGGTGTTAAATGGGCCGCTTGGGAAGTATGGTCGCAAGACTGAAACTTA AAGGAATTGGCGGGGGAGCACCGCAACGGGAGGAATGTGCGGTTTAATTG GATTCAACGCCGGAAAACTCACCGGGAACGACCTGTGCATGAGAGTCAAC CTGACGAGCTTACTCGATAGCAGGAGAGGTGGTGCATGGCCGTCGTCAGC TCGTACCGTAGGGCGTTCACTTAAGTGTGATAACGAGCGAGACCCACATC TTTAATTGCAAATGTATATGAGAATATGCATGCACTTTAGAGAAACCGCC AGCGCTAAGCTGGAGGAAGGAGTGGTCGACGGCAGGTCAGTACGCCCCGA ATTTCCCGGGCTACACGCGCATTACAAAGAACGGGACAATACGTTGCAAC CTCGAAAGAGGAAGCTAATCGCGAAACCCGTCCATAGTTAGGATTGAGGG CTGTAACTCGCCCTCATGAATCTGGATTCCGTAGTAATCGCGGGTCAACA ACCCGCGGTGAACATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCA TCCGAGTTGGTGTTGGATGAGGTTTAATTCGAGAGGGTTAAATCAAATCT GATGTCGGTGAGGAGGGTTAAGTCGTAACAAGGTATCCGTA

  15. Demonstration: BLAST search for closely related reference sequences

  16. Demonstration: download and compile text file >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT ... >Ferroplasma_acidiphilum CTCGCTCGCCCATCYGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTC ... >Ferroplasma_cyprexacervatum TTCTGGTTNGATCCTGCCGGGCGGCCACTGCTATCAAGTTCCGACTAAGC ... >Thermoplasma_volcanium CGGTCACTGCTATCAGGTTCCGACTAAGCCATGCAAGTCACGGGGCCGTA ... >Picrophilus_oshimae ATTCTGGTTGATCCCGGCGGCGGCCACTGCTATCAAGTTCCGACTAAGCC ... >AMD_F.acidarmanus_TypI TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT ... >AMD_F.acidarmanus_TypII TCCGGTTGATCCTGCCGGCGGCCACCGCTATCAAGTTCCGACTAAGCCAT ... >AMD_Thermoplasmatales GGTTGATCCTGCCGGCGGCTACTGCTATCAGGTTTCGACTAAGCCATGCG ...

  17. Demonstration: sequence alignment and tree reconstruction

More Related