180 likes | 744 Views
Outline How to build a phylogenetic tree: Sequence alignment Tree reconstruction How to read a tree ClustalX software clustalx.exe njplotWIN95.exe Phylogenetic marker genes 16S rRNA Sequence format FastA Demonstration BLAST search Alignment & tree reconstruction
E N D
Outline • How to build a phylogenetic tree: • Sequence alignment • Tree reconstruction • How to read a tree • ClustalX software • clustalx.exe • njplotWIN95.exe • Phylogenetic marker genes • 16S rRNA • Sequence format • FastA • Demonstration • BLAST search • Alignment & tree reconstruction • Identification of unknown sequence Falk Warnecke, Microbial Ecology Program, JGI, FWarnecke@lbl.gov
Darwin and Haeckel Darwin, 1837 Haeckel, 1866
Darwin and Haeckel Darwin, 1837
Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions
Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions Aligned: *** *** * **** Sequence A: GTAACGTGA-TACG Sequence B: GTA-CGTCAATACG
Sequence alignment Correct alignment: homologous bases will stand one below the other in a column! Unaligned: *** ***** Sequence A: GTAACGTGATACG Sequence B: GTACGTCAATACG Conserved positions Exchanged base / substitution Aligned: *** *** * **** Sequence A: GTAACGTGA-TACG Sequence B: GTA-CGTCAATACG Insertion / deletion
A Distance matrix tree B C D E Genetic divergence 0.5 0.4 0.3 0.2 0.1 0.0 Phylogenetic tree reconstruction **** Sequence A: AAGGTTCCAC Sequence B: AAAATTCCAC Sequence C: AACCCCCCAC Sequence D: GGTTAACCAC Sequence E: GGTTGGCCAC A B C D A B 0.2 C 0.4 0.4 D 0.6 0.6 0.6 E 0.6 0.6 0.6 0.2
Free software for sequence alignment and tree calculation Download from: ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX (clustalx1.83.zip) ClustalX
Phylogenetic marker genes - prerequisites • Presence in all organisms Examples: • 16S rRNA • 23S rRNA • Elongation factors • EF-Tu • EF-G • ATP synthase • RecA • Hsp60 • RNA polymerase • Gyrase • Functional constancy • Complexity • Size (information content) • Conserved and variable • structure elements • Comprehensive database
Ribosomal RNA as a phylogenetic marker gene • Advantages: • Ubiquitous distribution • Functional constancy • Large size (information content) • Conserved and highly variable structural elements • Comprehensive databases available • (No lateral gene transfer) • Good target for FISH! • Disadvantages: • No continuous sequence change • Multiple genes/operons • Different species with identical 16S rRNAs • One base change needs nearly one million years
21 proteins 16S rRNA 30S 70S Ribosome subunits 50S 5S rRNA Escherichia coli 16S rRNA Primary and Secondary Structure 34 proteins 23S rRNA 16S Ribosomal RNA as a phylogenetic marker gene
Sequence format: Fasta >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT GCGAGTCAAGGTATCGTAAGATGCCGGCACACTGCTCAGTAACACGTGGA TAATCTAACCTTGAGTAAGGGATAACTTCGGGAAACTGAAGGTAATACCT TATAATTGCTTAAAACTGGAATGTTTTTGCAATAAAAGTTACGACGCTCA AGGATGAGTCTGCGACCTATCAGGTAGTAGGTGGTGTAATGGACCACCTA GCCTCAGACGGGTACGGGCCCTGGGAGGGGTAGCCCGGAGATGGACTCTG AGACATAAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAACACTGTGCAA TGCGCGAAAGCGCGACACGGGGAGCTTGAGTGTCTTGGCATAGCCAAGAC TTTTCTCATTCCTAAAAAGCATGAGGAATAAGTGCTGGGTAAGACGGGTG CCAGCCGCCGCGGTAACACCCGCAGCACGAGTAGTGGTCACTTTTATTGA GCCTAAAGCGTTCGTAGCCGGTTTTGTAAATCTTCAGATAAAGCCTGAAG CTTAACTCCAGAAAGTCTGAAGAGACTGCAAGACTTGAGATCGGGTGAGG TTAAACGTACTTTCAGGGTAGGGGTAAAATCCTGTAATCCCGGAAGGACG ACCAGTGGCGAAAGCGTTTAACTAGAACGAATCTGACGGTAAGGAACGAA GGCTAGGGTAGCAAACCGGATTAGATACCCGGGTAGTCCTAGCTGTAAAC ATTGCCCATTTGATGTTGCTTTTCCGTTGAGGGAAGGCAGTGTCGGAGCG AAGGTGTTAAATGGGCCGCTTGGGAAGTATGGTCGCAAGACTGAAACTTA AAGGAATTGGCGGGGGAGCACCGCAACGGGAGGAATGTGCGGTTTAATTG GATTCAACGCCGGAAAACTCACCGGGAACGACCTGTGCATGAGAGTCAAC CTGACGAGCTTACTCGATAGCAGGAGAGGTGGTGCATGGCCGTCGTCAGC TCGTACCGTAGGGCGTTCACTTAAGTGTGATAACGAGCGAGACCCACATC TTTAATTGCAAATGTATATGAGAATATGCATGCACTTTAGAGAAACCGCC AGCGCTAAGCTGGAGGAAGGAGTGGTCGACGGCAGGTCAGTACGCCCCGA ATTTCCCGGGCTACACGCGCATTACAAAGAACGGGACAATACGTTGCAAC CTCGAAAGAGGAAGCTAATCGCGAAACCCGTCCATAGTTAGGATTGAGGG CTGTAACTCGCCCTCATGAATCTGGATTCCGTAGTAATCGCGGGTCAACA ACCCGCGGTGAACATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCA TCCGAGTTGGTGTTGGATGAGGTTTAATTCGAGAGGGTTAAATCAAATCT GATGTCGGTGAGGAGGGTTAAGTCGTAACAAGGTATCCGTA 16S rRNA sequence 1441 nt
Demonstration: Identify unknown organism • Starting point: 16S rRNA sequence of an unknown organism • Strategy: • Retrieve closely related reference sequences from Genbank via BLAST • Compile sequences in Fasta format in one text file • Do sequence alignment and tree reconstruction using ClustalX • Identify organism
Demonstration: 16S rRNA sequence of unknown organism >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT GCGAGTCAAGGTATCGTAAGATGCCGGCACACTGCTCAGTAACACGTGGA TAATCTAACCTTGAGTAAGGGATAACTTCGGGAAACTGAAGGTAATACCT TATAATTGCTTAAAACTGGAATGTTTTTGCAATAAAAGTTACGACGCTCA AGGATGAGTCTGCGACCTATCAGGTAGTAGGTGGTGTAATGGACCACCTA GCCTCAGACGGGTACGGGCCCTGGGAGGGGTAGCCCGGAGATGGACTCTG AGACATAAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAACACTGTGCAA TGCGCGAAAGCGCGACACGGGGAGCTTGAGTGTCTTGGCATAGCCAAGAC TTTTCTCATTCCTAAAAAGCATGAGGAATAAGTGCTGGGTAAGACGGGTG CCAGCCGCCGCGGTAACACCCGCAGCACGAGTAGTGGTCACTTTTATTGA GCCTAAAGCGTTCGTAGCCGGTTTTGTAAATCTTCAGATAAAGCCTGAAG CTTAACTCCAGAAAGTCTGAAGAGACTGCAAGACTTGAGATCGGGTGAGG TTAAACGTACTTTCAGGGTAGGGGTAAAATCCTGTAATCCCGGAAGGACG ACCAGTGGCGAAAGCGTTTAACTAGAACGAATCTGACGGTAAGGAACGAA GGCTAGGGTAGCAAACCGGATTAGATACCCGGGTAGTCCTAGCTGTAAAC ATTGCCCATTTGATGTTGCTTTTCCGTTGAGGGAAGGCAGTGTCGGAGCG AAGGTGTTAAATGGGCCGCTTGGGAAGTATGGTCGCAAGACTGAAACTTA AAGGAATTGGCGGGGGAGCACCGCAACGGGAGGAATGTGCGGTTTAATTG GATTCAACGCCGGAAAACTCACCGGGAACGACCTGTGCATGAGAGTCAAC CTGACGAGCTTACTCGATAGCAGGAGAGGTGGTGCATGGCCGTCGTCAGC TCGTACCGTAGGGCGTTCACTTAAGTGTGATAACGAGCGAGACCCACATC TTTAATTGCAAATGTATATGAGAATATGCATGCACTTTAGAGAAACCGCC AGCGCTAAGCTGGAGGAAGGAGTGGTCGACGGCAGGTCAGTACGCCCCGA ATTTCCCGGGCTACACGCGCATTACAAAGAACGGGACAATACGTTGCAAC CTCGAAAGAGGAAGCTAATCGCGAAACCCGTCCATAGTTAGGATTGAGGG CTGTAACTCGCCCTCATGAATCTGGATTCCGTAGTAATCGCGGGTCAACA ACCCGCGGTGAACATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCA TCCGAGTTGGTGTTGGATGAGGTTTAATTCGAGAGGGTTAAATCAAATCT GATGTCGGTGAGGAGGGTTAAGTCGTAACAAGGTATCCGTA
Demonstration: BLAST search for closely related reference sequences
Demonstration: download and compile text file >AMD_unknown_sequence TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT ... >Ferroplasma_acidiphilum CTCGCTCGCCCATCYGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTC ... >Ferroplasma_cyprexacervatum TTCTGGTTNGATCCTGCCGGGCGGCCACTGCTATCAAGTTCCGACTAAGC ... >Thermoplasma_volcanium CGGTCACTGCTATCAGGTTCCGACTAAGCCATGCAAGTCACGGGGCCGTA ... >Picrophilus_oshimae ATTCTGGTTGATCCCGGCGGCGGCCACTGCTATCAAGTTCCGACTAAGCC ... >AMD_F.acidarmanus_TypI TCCGGTTGATCCTGCCGGCGGCCACTGCTATCAAGTTCCGACTAAGCCAT ... >AMD_F.acidarmanus_TypII TCCGGTTGATCCTGCCGGCGGCCACCGCTATCAAGTTCCGACTAAGCCAT ... >AMD_Thermoplasmatales GGTTGATCCTGCCGGCGGCTACTGCTATCAGGTTTCGACTAAGCCATGCG ...