350 likes | 375 Views
A Study on Measuring Distance between Two Trees. Advisor: 阮夙姿 教授 Presenter : 林陳輝. Outline. Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Mixture - matching distance
E N D
A Study on Measuring Distance between Two Trees Advisor: 阮夙姿 教授 Presenter : 林陳輝
Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University
Introduction • Evolutionary tree • Comparing trees • Comparing trees is not easy -Phylogenetic tree, wikipedia CSIE, National Chi Nan University
Mixture tree Time taxa S.-C. Chen and B. G. Lindsay, “Building Mixture Trees from Binary Sequence Data,” Biometrika, 2006. CSIE, National Chi Nan University
Problem definition • The leaves are associating taxas • There is a time parameter on every internal node 11 v1 9 v2 8 v3 v5 7 v7 3 1 v4 5 v6 H G F A B C D E CSIE, National Chi Nan University
Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University
Related work • Path difference metricdp(T1, T2) = ||d(T1) – d(T2)||2d(Ti) is a vector that contains all pair leaves distance of Ti. • M. A. Steel and D. Penny, “Distributions of Tree Comparison Metrics – Some New Results,” Syst. Biol. 42(2):126-141, 1993. CSIE, National Chi Nan University
Related work • Nodal metric • In full binary trees, the complexity is O(n3). • In complete binary trees, the complexity is O(n2log n). • John Bluis and Dong-Guk Shin, “Nodal Distance Algorithm: Calculating a Phylogenetic Tree Comparison Metric,” Proc. of the 3rd IEEE Symposium on BioInformatics and BioEngineering, 87- 94, 2003 CSIE, National Chi Nan University
Related work • Matching distance • P. W. Diaconis and S. P. Holmes, “Matchings and Phylogenetic Trees.," Proc. Natl Acad Sci U S A, Vol. 95, No. 25, pp. 14600~14602, 1998. • The algorithm for matching distance • G. Valiente, A Fast Algorithmic Technique for Comparing Large Phylogenetic Trees," SPIRE, pp. 370~375, 2005. CSIE, National Chi Nan University
Matching Representation 0 11 9 0 10 0 0 8 7 0 3 4 5 6 1 2 {1,2} {5,6} {3,7} {4,8} {9,10} CSIE, National Chi Nan University
{1,2} {5,6} {3,7} {4,8} {9,10} {1,3} {4,6} {2,7} {5,8} {9,10} Matching distance 11 11 T2 T1 10 10 9 9 8 8 7 7 3 4 2 5 5 6 4 6 1 2 1 3 T1 T2 The distance is 2 CSIE, National Chi Nan University
Outline • Introduction • Problem definition • Related work • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusion and Future work CSIE, National Chi Nan University
Mixture distance and algorithms • Definition: • pTi (x, y) is time parameter of the LCA of leaves x, y 9 v1 9 v1 3 2 3 v3 1 v3 v2 v2 A D C B A B C D CSIE, National Chi Nan University
Distance conditions • The distance from an object to itself is zero. • The distance from A to B is the same as the distance from B to A. • The Triangle Inequality holds true. - J. Felsenstein, Inferring phylogenies. Sunderland, MA: Sinauer Associates, 2004. CSIE, National Chi Nan University
AB: |8 – 1| = 7 AC: |8 – 9| = 1 AD: |8 – 9| = 1 BC: |4 – 9| = 5 BD: |4 – 9| = 5 CD: |1 – 3| = 2 Distance = 21 Algorithm • C(n, 2) • Algorithmic idea: grouping • Full binary tree 8 v1 9 v1 v2 4 3 1 v2 v3 1 v3 B A C D A B C D CSIE, National Chi Nan University
Algorithm T2 T1 9 v1 9 v1 6 8 v2 v3 7 v2 8 v3 5 3 v7 1 4 5 v7 3 v4 v5 2 v4 v5 4 v6 v6 H E A B G D F C H G F A B C E D CSIE, National Chi Nan University
9 v1 T1 |pT1(v1) - pT2(v7)| × (0 × 0+1 × 1) = |9 - 5| × (0*0+1*1) = 4 v3 |pT1(v1)- pT2(v6)| × (1 × 1+0 × 0) = |9 - 4| × (1*1+0*0) = 5 7 v2 8 |pT1(v1)- pT2(v3)| × (1 × 1+1 × 1) = |9 - 8| × (1*1+1*1) = 2 5 v7 3 2 v4 v5 4 v6 H G F A B C E D T2 v1 9 Red:2 Green:2 6 8 v2 v3 Red:1Green:1 Red:1 Green:1 5 3 1 4 v4 v7 v5 v6 H A G D E F C B Red:0 Green:1 Red:1 Green:0 Red:0 Green:1 Red:1 Green:0 CSIE, National Chi Nan University
T1 |pT1(v2)- pT2(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 9 v1 |pT1(v2)- pT2(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |pT1(v2)- pT2(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 7 v2 8 v3 5 v7 3 2 v5 4 v4 v6 T2 H G F A B C E D Red:2Green:2 9 v1 Red:2Green:0 6 8 v2 v3 Red:0Green:2 5 3 1 4 v7 v4 v5 v6 B H A G C D F E Red:1 Green:0 Red:1 Green:0 Red:0Green:1 Red:0 Green:1 Red:0 Green:0 Red:0 Green:0 CSIE, National Chi Nan University
Complexity analysis • For every internal node of T1, coloring all leaves needs O(n). • Counting distance in T2 needs O(n). • The time complexity is O(n2). CSIE, National Chi Nan University
The modified algorithm • Boost up the basic algorithm • Too much empty color information CSIE, National Chi Nan University
T1 |pT1(v2)- pT2(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 9 v1 |pT1(v2)- pT2(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |pT1(v2)- pT2(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 7 v2 8 v3 5 v7 3 2 v5 4 v4 v6 T2 H G F A B C E D Red:2Green:2 9 v1 Red:2Green:0 Empty color information 6 8 v2 v3 Red:0Green:2 5 3 1 4 v7 v4 v5 v6 B H A G C D F E Red:1 Green:0 Red:1 Green:0 Red:0Green:1 Red:0 Green:1 Red:0 Green:0 Red:0 Green:0 CSIE, National Chi Nan University
T2 9 v1 6 8 v2 v3 5 3 1 4 v7 v4 v5 v6 B H A G C D F E T2 9 v1 8 v3 1 v4 B A C D CSIE, National Chi Nan University
The modified algorithm • Finding LCA in constant time with O(n) preprocessing • MA Bender, MIF Colton, The LCA Problem Revisited, Proc. LATIN, 2000 • 2-way merge problem • R.C.T. Lee, S. S. Tseng, R.C. Chang and Y. T. Tsai, Introduction to the Design and Analysis of Algorithms. McGraw-Hill Education, 2005 CSIE, National Chi Nan University
T2 15 T1 9 v1 9 v1 14 7 7 8 v2 v3 6 v2 8 v3 13 10 3 6 5 3 v7 2 4 5 v7 3 v4 v5 1 v4 v5 4 v6 v6 H E A B G D F C H G F A B C E D 5 8 1 2 4 5 8 9 11 12 11 12 9 4 1 2 CSIE, National Chi Nan University
15 T1 3 v4 |1 – 2| (1 1 + 0 0) = 1 9 v1 1 14 7 v2 6 8 v3 13 10 3 6 5 v7 3 1 v4 v5 4 v6 2 1 H G F A B C D E 1 2 4 5 8 9 11 12 T2 9 v1 7 8 v2 v3 4, 9 1, 2 11, 12 5,8 5 3 v7 2 4 v4 v5 v6 H E A B G D F C 1 2 12 5 11 8 9 4 CSIE, National Chi Nan University
15 T1 9 v1 15 9 14 v1 7 6 v2 8 v3 13 10 3 6 13 3 5 v7 3 1 v4 v5 4 5 1 v6 v7 v4 H G F A B C D E 1 2 4 5 8 9 11 12 T2 H A B G 1 2 11 12 1, 2, 4, 5, 8, 9, 11, 12 9 v1 1, 2, 11, 12 4, 5, 8, 9 |9 – 7| (2 2 – 0 0) = 8 7 8 v2 v3 4, 9 1, 2 11, 12 5,8 5 3 v7 2 4 v4 v5 v6 C E F H A B G D 8 9 4 5 11 12 1 2 CSIE, National Chi Nan University
Complexity analysis • To reconstruct subtree of T1 is in linear time • Counting distance in reconstructed subtreeneeds O(m). • The height of complete binary tree is O(logn) • The total complexity is O(nlogn) in complete binary tree. CSIE, National Chi Nan University
Outline • Introduction • Problem definition • Related works • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University
Mixture-matching distance • Distance = • i is matching distance between T1 and T2. • PTm denotes the product of all time parameter in Tm CSIE, National Chi Nan University
T1 T2 9 15 9 15 7 8 6 8 14 14 13 13 5 3 2 4 5 3 1 4 10 9 9 11 12 12 10 11 H E A B G D F C G H D E F B C A 8 8 5 6 6 7 1 2 4 7 4 5 1 3 2 3 T1 {1, 2} {3, 4} {5, 6} {7, 8} {9,10} {11, 12} {13, 14} {1, 2} {3, 6} {4, 5} {7, 8} {9,12} {10, 11} {13, 14} T2 Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 CSIE, National Chi Nan University
Distance = Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 1 0 i ∞ Distance No different leaves i transposition The same The time complexity is O(n) CSIE, National Chi Nan University
Outline • Introduction • Problem definition • Related works • The metric and algorithms • Mixture distance • Basic algorithm • The modified algorithm • Mixture - matching distance • Mixture - matching distance • Conclusions and Future work CSIE, National Chi Nan University
Conclusions CSIE, National Chi Nan University
Future work • Improve the time complexity • Extend to k - ary trees • Add mutation point CSIE, National Chi Nan University