460 likes | 622 Views
On the Efficiency of the Hamming C- Centerstring Problems. Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom. Motivation – the Conference Location Problem. Consensus String Problem. Input: points in space.
E N D
On the Efficiency of the Hamming C-CenterstringProblems Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom
Consensus String Problem Input: points in space. Output: Find a point whose maximum Distance from all points is smallest
History: • Frances and Litman [1997]: • Problem isNP-complete even for binary alphabets • Therefore: 3 directions. • Solution for small k. • Fixed parameter tractability. • Approximation algorithms.
History: Solution for small k: Gramm, Niedermeier, and Rossmanith [2001] (3) Boucher, Brown, and Durocher [2008] (4 binary) A., Landau, Na, Park, Park, and Sim [2009] (3, radius & dist. sum optimization) A., Paryenty, and Roditty [2012](5 binary, l 2 for all k: l k)
History: Fixed Parameter Tractability for all Parameters: Fixed l: Ben-Dor, Lancia, Perone, and Ravi [1997] Fixed k: Gramm, Niedermeier, and Rossmanith [2003] Fixed d: Sojanovic, Berman, Gumucio, Hardison, and Miller [1997] Lanctot, Li, Ma, Wang, and Zhang [1999] Sze, Lu, and Chen [2004]
History: Approximations: PTAS: Li, Ma, and Wang [2002] – not practical. Rounded LP: Ben-Dor, Lancia, Perone, and Ravi [1997] large number of variables: |Σ|l Chimani, Woste, and Bocker [2011]: can be reduced to: |Σ|(l-1) A., Paryenty, and Roditty [2011]: |T(S)| |Σ| (T(S)= set of column types)
Another Motivation – Clustering.The C-CenterStrings problem • Input: • Points in space • Number c • Objective function f. Output: Divide the points to c sets such that for the c consensus strings c1,c2,…,cc, f(c1,c2,…,cc) is maximum/minimum.
Three Types of Objective functions: • Let HRC (Hamming Radius Clustering) be the consensus string problem defined before. • c-HRC: partition into c sets, each of which has center with radius d. • 2. c-HRLC: partition into c sets, each of which has center with radius d, but center is part of input set. • 3. c-HRSC: partition into c sets, each of which has a center and the sum of the radii does not exceed d.
The Hamming radius c-clustering problem (c-HRC) Example: For the following strings and d=1, we show it belongs to 2-HRC.
The Hamming radius local c-clustering problem (c-HRLC) Example: For the following strings and d=2, we show it belongs to 2-HRLC. Does it belong to 2-HRLC when d=1 ?
The Hamming radius c-clustering sum problem (c-HRSC) Example: For the following strings and d=2, we show it belongs to 2-HRC.
In this Paper: We consider: Parametetrized Complexity, and Approximations Small k is not too meaningful in the context of clustering.
Theorem: HRC,HRLC and HRSC can be solved in polynomial time for fixed k. • If k≤c then input strings can be assigned to c centers where d=0. • Otherwise c<k. There are ck<kk options for partitioning k strings to c sets. • - For each set, find the consensus center in • polynomial time. • - The partition that gives the best result is the optimal solution.
Theorem: HRC is NP complete even if the radius is fixed to d = 1. • d = 1 and the alphabet is binary • By reduction from Vertex Cover For Triangle-Free Graphs • Our input: • G - Triangle-Free Graph • t – size of vertex-cover set
The construction: The c parameter is t. The distance parameter d is 1. 1 2 Encode edges as bit strings of length |V|. Set the bits of the vertices on the two sides of the edge. 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3
Theorem: HRLC is NP complete even if the length is fixed to l=2 • We prove by reduction from Minimum Maximal Matching for Bipartite graphs • Our input: • G – Bipartite Graph • t – size of the minimal set that is maximal matching Minimum Maximal Matching Maximal Matching
The construction: The c parameter is t. The distance parameter d is 1. 1 2 3 4 5
1 2 3 4 5
Move strings [6,2] and [5,2] if there are centers begins in 5 or 6 Change the center to one of the remaining strings We keep going until there are no two centers with common symbol !
Approximation Algorithms • 1. A linear-time 4-Approximation for the 2-HRSC problem. • 2. A polynomial time 3-Approximation for the 2-HRSC problem. • 3. Special case PTAS – by computing the clusters and doing 1-HRC approximation on each cluster.
Lemma >2d >2d >2d
Proof center
If we had a representative from each cluster we can associate the rest of the strings to the appropriate group • Now use a knownapproximation algorithmof 1-HRC, for finding the consensus strings of each cluster >2d >2d >2d
Lemma >4d Cluster c-center Cluster c-center
Proof ≤d ≤d ≤d ≤d ≤d
Future work • We presented a heuristic algorithm that did very well in practice – what is its approximation ratio? • 2. There are some gaps in the parameterized complexity • table: • a. What happens in the HRLC/HRSC cases for fixed d? • b. What happens in the HRC/HRSC cases for fixed l? • 3. Is there a PTAS for c-HRC? • 4. Can we approximate c-HRC using LP? SDP?