220 likes | 484 Views
Tutorial 5. Phylogenetic Trees. A. C. D. B. Multiple Sequence Alignment – When?. More than two sequences DNA Protein Find evolutionary relation Homology Phylogenetic tree Detect motif. GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC.
E N D
Tutorial 5 Phylogenetic Trees
A C D B Multiple Sequence Alignment – When? • More than two sequences • DNA • Protein • Find evolutionary relation • Homology Phylogenetic tree • Detect motif GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
A C D B Multiple Sequence Alignment – How? • Dynamic Programming • Optimal alignment • Exponential in #Sequences • Progressive Alignment (also known as hierarchical, incremental or tree method) • Heuristic • Efficient GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
Progressive Alignment 1 1 3 TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC 2 2 3
Progressive Alignment 1 • Note: • Not all matrices can be embedded in a tree without error. 3 TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC 2
Neighbor Joining Algorithm Constructs an unrooted guide tree from a distance matrix 6
Neighbor Joining Algorithm • Calculate all pairwise distances. • Find 2 nodes i and j, such that the relative distance between i and j is minimal. • Remove the rows and columns of i and j • Add a new row and column k (the parent of i and j), and compute the distance from k to any other remaining node. • Continue until two nodes remain – connect with edge.
Step 1. Calculate all pairwise distances • A, B, …, E are tree nodes. Each character represents a sequence. • How can we measure distance between sequences???
Step 1. Calculate all pairwise distances distance between sequences • Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences • The score increases as the dissimilarity between residues increases.
Step 2. Two nodes with minimal relative distance problem • To find neighboring leaves we simply select a pair of closest leaves. • WRONG !
Step 2. Two nodes with minimal relative distance problem • Closest leaves aren’t necessarily neighbors • i and j are neighbors, but (dij= 13) > (djk = 12)
Step 2. Two nodes with minimal relative distance solution • Find a pair of leaves that are close to each other, but far from other leaves. • This is called “relative distance”.
Step 2. Two nodes with minimal relative distance Relative distance between i and j Distance between i and j from the distance table Distance of i from all other sequences Number of leaves (=sequences) left in the tree
Step 2. Two nodes with minimal relative distance Original distances matrix:
Step 2. Two nodes with minimal relative distance The relative distance table: A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances
Steps 3+4. Remove i, j and add k to the matrix the distance from k to any other leaf m can be computed as: Dkm = (Dim + Djm – Dij)/2 Compress i and j into k, iterate algorithm for rest of tree
Steps 3+4. Remove i, j and add k to the matrix Now we’ll calculate the distance from X to all other nodes.
Steps 5. Continue till 2 nodes remain The final tree: 5 Z 9 C Y 20 X 6 12 E B 4 10 D A What’s missing?
Distance of a neighboring pair to their parent node • After defining A,B as neighbors, and defining k as their parent node, • How do we compute the distances: • d(A,k), d(B,k)?