230 likes | 361 Views
Constructing Phylogenetic Trees. Gloria Rendon NCSA July 2008. Molecular Evolution. In the past, much of this work was done by making observations of anatomy and physiology and with comparisons in fossil records.
E N D
Constructing Phylogenetic Trees Gloria Rendon NCSA July 2008
Molecular Evolution • In the past, much of this work was done by making observations of anatomy and physiology and with comparisons in fossil records. • More recently, techniques have been developed in molecular biology for performing such evolutionary comparisons at the molecular level.
Molecular Evolution • One of the newest quantitative methods is to compare the nucleotide (or amino acid) sequence in a particular segment of DNA common to all the organisms to be included in the study. • Those organisms that show the greatest number of nucleotide sequence differences are considered to have diverged from a common ancestor (following separate evolutionary paths) the greatest number of years ago
Molecular Evolution. What to compare? • Not all segments of DNA or of genes are equally suitable for conducting evolutionary studies among organisms because different sequences accumulate changes at different rates: • Proteins (or protein coding DNAs) are constrained by natural selection - better for studying very distant relationships • Some sequences are highly variable (rRNA spacer regions, immunoglobulin genes), while others are highly conserved (actin, rRNA coding regions) • Different regions within a single gene can evolve at different rates (conserved vs. variable domains)
Molecular Evolution. What to compare? Phylogenetic tree depicting the three primary kingdoms. Carl Woese [1]. To reconstruct this tree, 16S (prokaryotes) or 18S (eukaryotes) ribosomal RNA genes from these 13 different organisms were compared
Molecular Evolution. What to compare? Phylogenetic tree of primates according to Lahn[2]. To reconstruct this tree, the microcephalin gene, brain size, from these six organisms was compared.
Suppose we have a set of FIVE protein sequences; they should be homologous among themselves AND must come from different organisms. Step1. Collect the sequences. >seq_org1 VNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKF >seq_org2 VNFKLLSHCLLVTLACHLPTEFTPAVHASLDKF >seq_org33 ENFKLLTNVLVCVLAHHFGRFFTPPVHAAYQKF >seq_org4 ENFKLLTNVLVCVLAVHFGKFFTPPVHAAYQKF >seq_org5 DNFKLLSEMIIQVLASHHPPCFTPDVHGMMVKF In the interest of time, we have copied these sequences to the Biology Workbench. We usually need to conduct a BLAST search in order to get to this point.
Step 1. Collect the sequences • Open the internet browser to this page: http://bsw-uiuc.net/moodle/ • Click on the link "interactive lessons" located in the left side panel. • [a new page opens] • Click on the panel with label Reference to close that panel • Select the Biology Workbench by clicking on the tab "BW" in the panel below the word Workbench • [the Biology Workbench home page is displayed] • Click on the link "To enter the Biology Workbench" • [a popup window opens and asks for account information] • Type the following account information Userid: Beckman_oh Password: user • [the main page of the Biology Workbench is displayed] • Click on the "Session Tools" button • [the Session tools page is displayed] • Select the session called 'phylogeny_101' • From the window box select 'Resume session' • Click on 'Run' • [the session called 'phylogeny_101 becomes the current session] • Select 'Protein Tools'
Step 2. Align all sequences • To estimate when those organisms may have diverged from a common ancestor, we need to compare how different from each other the sequences are. How do we do that? One way is by trying to align them; that is, by matching the positions of the residues that have not changed and ALIGNING them in the same column as illustrated in the figure below. Seq_org1 VNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKF Seq_org2 VNFKLLSHCLLVTLACHLPTEFTPAVHASLDKF Seq_org3 ENFKLLTNVLVCVLAHHFGRFFTPPVHAAYQKF Seq_org4 ENFKLLTNVLVCVLAVHFGKFFTPPVHAAYQKF Seq_org5 DNFKLLSEMIIQVLASHHPPCFTPDVHGMMVKF
Step 2. Align all sequences with BW To align the sequences automatically, do the following: • From the box click on "Select all sequences" and then on the button 'ok' [a new window appears that asks for additional parameters to run the Clustaw application] • Do not change anything on this screen; that is, accept all the values that have been chosen by default • Click on 'Submit' to run the Clustalw application [after a few moments, the results page will be displayed] • Compare the results of the Biology Workbench with the alignment we included here. • DO NOT CLOSE THIS PAGE YET
Step 3. Count • To estimate the evolutionary distance. We need to use the results of the previous step to count the number of differences between all pairs of residues of the sequences AND to make the actual tree construction task a little bit easier, we also need to sort the differences from smallest difference to largest difference.
Step 4. Score • To the counts of the previous step we need to apply different weights because not all substitutions are equally likely. • The weights are contained in substitution matrices like BLOSUM and PAM. • There are substitution matrices for nucleotide residues and separate substitution matrices for amino acid residues.
Scoring the Alignment. A simple example Notice: the operation is the dot product between these two matrices; rather than matrix multiplication We place the result in the corresponding cell, thus
Step 4. Score • ClustalW will perform the count and the scoring automatically. Just scroll down the page of results and you will see this segment:
Step 5. Build the tree • We need to build a binary tree with those five sequences, that is, a tree in which each node can have up to two banches and the sequences go on the leaves (tips) of the tree.
Tree topologies • The number of possible trees that can be constructed with n sequences grows exponentially. • For example, with five sequences; we have at least these three different possibilities
Basic Procedure for building trees • Start with TWO sequences and add the rest of the sequences one at a time. • Each new sequence becomes a leaf of the tree (meaning, nothing further can be attached to this point). • Choose the place carefully and take into account the score information we obtained from the previous step. • Sequences 3 and 4 are closest; therefore, they should stem from the same tree branch. • Sequences 1 and 2 are also close to each other than to any other and should stem from the same node. • These two branches [1-2, 3-4] are closest to each other than to Sequence 5. • Sequence 5 seems to be the outlier or outgroup.
Procedure... illustrated start with 2 sequences, for instance seq_org3 and seq_org4 add one sequence, seq_org1 add another sequence, seq_org2 add the last sequence, seq_org5
Building the tree with BW • ClustalW will calculate a dendrogram automatically. Just scroll down the results page to see the drawing of the tree. • Is your tree similar to the tree built by ClustalW? are the two trees identical? How many clades are there in the tree?
Now what? • Additional exercises: • The three primary kingdoms. Dr C Woese paper (more on building phylogenies) • Human Lineage. Dr. Lahn’s paper (more on building phylogenies) • CSI 101 ( with the hepatitis gene) • Protein complexes (ion channel-scorpion toxin) • Lessons in the Molecular Science Student Workbench • Authoring environment to build lessons in the Molecular Science Student Workbench