240 likes | 529 Views
Phylogenetic Tree Generation. Brandon Andrews CS6030. Topics. What is a phylogenetic tree? Goals in a phylogenetic tree generator Distance based method Fitch- Margoliash Method Example Verification Demo. What is a phylogenetic tree?. B and C are similar
E N D
Phylogenetic Tree Generation Brandon Andrews CS6030
Topics • What is a phylogenetic tree? • Goals in a phylogenetic tree generator • Distance based method • Fitch-Margoliash Method • Example • Verification • Demo
What is a phylogenetic tree? • B and C are similar • A and B are more similar than A and C which have a longer distance A B C Also known as an evolutionary tree Attempts to map the genetic similarity of organisms into a tree where longer branches indicate more dissimiliarity
Goals in a phylogenetic tree generator • Given the sequences and calculated or known dissimilarity construct a tree which correctly maps this data • Naïve method: Generate every possible tree and grade its quality
Distance based method • Take a distance matrix that stores the distance from every sequence to every other sequence • Construct a tree which preserves these distances • Most don’t 100% preserve the distances
Fitch-Margoliash Method Clustering algorithm that works bottom up to create an unrooted tree Weights are used to help lower the error rate for long paths
Example • Calculate a distance matrix • Hamming distance can be used, but a better dissimilarity function is advised
Steps D d • dist(ABC, D) is the average distance from ABC to D • Dist(ABC, E) is the average distance from ABC to E • d = (dist(D, E) + (dist(ABC, D) - dist(ABC, E))) / 2; • e = dist(D, E) - d; • abc = dist(ABC, D) - d; A, B, C abc e E Add all the sequences to an array of nodes and mark them as leaves Select the closest nodes by scanning the distance matrix Those two nodes, in our example D and E will make up the two branches in a 3-branch calculation to find the branch lengths
Steps Continued • dist(ABC, D) and dist(ABC, E) • Calculate by taking the distance from each of the elements A, B, and C and averaging them • d = (10 + (32.6… - 34.6…)) / 2 = 4 • e = 10 - 4 = 6 • abc = 32.6… - 4 = 28.6…
D 4 A, B, C 28.6… 6 E Now we can create a new node with distance 28.6… and set D and E to their respective distances Since D and E are leaves their distance are kept. However, if they weren’t then the average of the child distances would be subtracted as seen later
Steps Continued • The final step in this iteration is to recalculate the nodes and distance matrix • The nodes array has the new merged node DE appended to the end and D and E are removed • The distance matrix is updated with DE merged and D and E are removed:
Steps Continued C c • dist(AB, C) is the average distance from AB to C • Dist(AB, DE) is the average distance from AB to DE • c = (dist(C, DE) + (dist(AB, C) - dist(AB, DE))) / 2; • de = dist(C, DE) - c; • ab = dist(AB, C) - c; A, B ab de DE • Look at the new distance matrix find the closest pair, C and DE • Now there is a special step. C is a leaf so it gets the calculated distance • DE is not a leaf so we need to subtract from DE the average child distance
Merging A and B to calculate the average distance to C and DE. • dist(AB, C) • dist(AB, DE)
Steps Continued 1 4 2 5 6 3 • Average child distance example • Recursively take the average of each branches • ((5 + ((2 + (4 + 6) / 2) + 3) / 2) + 1) / 2 = 5.5
Steps Continued • So for DE which has two child nodes we need to subtract the average of the children. • Since DE has two leaf nodes we perform: • (4 + 6) / 2 = 5 • So now we calculate c, de, and ab: • c = (dist(C, DE) + (dist(AB, C) - dist(AB, DE))) / 2 = (19 + (40 – 41)) / 2 = 9 • de = dist(C, DE) – c – AverageDistance(DE) = 19 – 9 – (4 + 6) / 2 = 5 • ab = dist(AB, C) – c = 40 – 9 = 31 • Notice that the distance at de replaces whatever was previously there
Steps Continued C 9 A, B D 31 4 5 6 E With the new node added: Recalculated distance matrix:
Steps Continued A a • dist(CDE, A) is the average distance from CDE to A • Dist(CDE, B) is the average distance from CDE to B • a = (dist(A, B) + (dist(CDE, A) - dist(CDE, B))) / 2 = 10 • b = dist(A, B) - c = 12 • cde = dist(CDE, A) - a = 29.5 CDE cde b B • As before choose the next closest nodes by looking at the distance matrix • A and B are chosen • Now a and b can be calculated since they are leaves, but notice we’re linking two trees at cde, so we need a special step to subtract the average distance
A C 10 9 29.5 CDE A, B D cde cde 4 12 5 B A C 10 9 6 20 E D 4 12 5 B 6 • So 29.5 - AverageDistance(CDE) • 29.5 - ((5 + (4 + 6) / 2) + 9) / 2 = 29.5 - 9.5 = 20 E
Steps Continued 10 10 10 5 9 12 A B C 4 6 D E • So we have a completely defined unrooted tree. How do we root it? • Just take the last branch and divide it by two
Verification • Original: • From thegenerated tree: • Exact match • Rare to happen • Usually off by asmall amount
Demo http://sirisian.com/javascript/CS6030Project.html
Conclusion Distance based methods such as the Fitch-Margoliash method produce very accurate trees given an accurate distance matrix in a very timely manner
References Bacardit, J., Krasnogor, N. Phylogenetic Trees[PPT document]. Retrieved from http://www.cs.nott.ac.uk/~jqb/G53BIO/Slides/Phylogenetic%20Trees.ppt Louhisuo K. (2004, May 4). Constructing phylogenetic trees with UPGMA and Fitch- Margoliash. Retrieved from http://www.niksula.cs.hut.fi/~klouhisu/Bioinfo/phyltree.pdf