130 likes | 238 Views
Computing Phylogenetic Roots with Bounded Degrees and Errors is NP-complete. Tatsuie Tsukiji (speaker) Zhi-Zhong Chen Tokyo Denki University. Called “phylogeny”. k. Given: a graph G. j. i. j. a. h. f. a. d. i. h. f. g. g. e. b. c. d. c. e. b.
E N D
Computing Phylogenetic Roots with Bounded Degrees and Errors is NP-complete Tatsuie Tsukiji (speaker) Zhi-Zhong Chen Tokyo Denki University
Called “phylogeny” k Given: a graph G. j i j a h f a d i h f g g e b c d c e b Leaves of T =vertices of G. Each vertex represents an extant specie. Degree of each internal node of T is at least 3. Each edge corresponds to similarityin evolutionary characteristics. Two vertices are adjacent in G iff their distance in T is at most4. k (k ≧2is a fixed constant.) PR4 The phylogenetic kth root problem (PRk) Output: a tree T such that
PRk is solvable in polynomial time for k =2, 3, 4 . What are known about PRk ? The complexity of PRk for k > 4is still known. ΔPRk : a natural special case of PRk where the output phylogeny has maximum degree Δ. ΔPRk can be solved in linear time
The closest phylogeneitic kth root problem (CPRk) Given: a graph G= (V, E). Output: a phylogeny T that minimizes the number of errors |T kE | |T 3E | T T3 T3-E G= (V, E) E-T3 An optimization problem =4
The closest phylogeneitic kth root problem (CPRk) Given: a graph G= (V, E). Output: a phylogeny T that minimizes the number of errors , where |T kE | An optimization problem Motivation:Gis derived from some similarity data, which are usually inexact in practice. CPR2 has been studied extensively. (See correlation clustering papers in FOCS and STOC.)
Known results Results PRk isSolvable in polynomial time for k =2, 3, 4 . ΔPRk can be solved in linear time CPRkis NP-hard for any fixed k ≧2. New Result ΔCPRk is NP-complete, for any fixed k ≧3 and Δ≧3
Correlation Clustering: Minimize #(inner nonedges) + #(outer edges) of G e f a c d g h clique of T2 clique of T2 b unbounded degree |T 3E(gaget )| If <the clique sizethen G gaget T clique clique clique b a 1. CPR2 =Correlation Clustering NP-completeness: CPRk 2. CPR2≦CPRk dT(a,b) = 3
∃T |T 3E(G’) | ≦ #(degree-3 vertices)/2 T, 3 G G’ error= ½ at degree-3 vertices ofG error= 0at degree-2 vertices ofG 0 ½ 0 × 1 0 0 × 1/2 1/2 ½ of graphs with maximum degree 3 from Hamiltonian Path NP-completeness: 3CPR3 G has HP error≧ ½at degree-3 vertices ofG
If |T5 E(7-clique)|≦2then T is 7-clique 7-clique Distance( , ) = 1 Distance( , ) ≧ 2 Pad distance 1 at every vertex of G NP-completeness: 3CPR3≦ 3CPR5 or ∃1 degree-2 internal node port 7-clique = (5,1,2)-core graph
7-clique ∃T |T 5E(G’) | ≦#(degree-3 vertices)/2 ∃T | T 3 atlifted GE(lifted G) | ≦#(degree-3 vertices)/2 T lifted G : i-port G ,G’ NP-completeness: 3CPR3≦ 3CPR5
If |T7 E(11-clique)|≦2then T is Distance( , ) = 1 Distance( , ) ≧ 2 Pad distance 1 Core graph: 3CPR3≦ 3CPR7 ∃1 port (7,1,2)-core graph
Phylogeny of 5-clique Phylogeny of 11-clique copies If |T7 E((the obtained tee)7)|≦2then Distance( , ) = 2 Distance( , ) ≧ 3 Pad distance 2 Core graph: 3CPR3≦ 3CPR7 ∃1 port (7,2,2)-core graph
open The complexity of PRk for k > 4? ΔPRk ∈P Summary and Open Problems new CPRkis NP-hard. ΔCPRk is NP-hard TRk,ΔTRk ∈P CTRk is NP-hard open Is ΔCTRk NP-hard ? Tree 3rd power Phylogenetic 3rd power