1 / 46

On the Efficiency of the Hamming C- Centerstring Problems

On the Efficiency of the Hamming C- Centerstring Problems. Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom. Motivation – the Conference Location Problem. Consensus String Problem. Input: points in space.

melba
Download Presentation

On the Efficiency of the Hamming C- Centerstring Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Efficiency of the Hamming C-CenterstringProblems Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom

  2. Motivation – the Conference Location Problem

  3. Consensus String Problem Input: points in space. Output: Find a point whose maximum Distance from all points is smallest

  4. Hamming Distance

  5. Consensus String Problem (1-HRC)

  6. History: • Frances and Litman [1997]: • Problem isNP-complete even for binary alphabets • Therefore: 3 directions. • Solution for small k. • Fixed parameter tractability. • Approximation algorithms.

  7. History: Solution for small k: Gramm, Niedermeier, and Rossmanith [2001] (3) Boucher, Brown, and Durocher [2008] (4 binary) A., Landau, Na, Park, Park, and Sim [2009] (3, radius & dist. sum optimization) A., Paryenty, and Roditty [2012](5 binary, l 2 for all k: l k)

  8. History: Fixed Parameter Tractability for all Parameters: Fixed l: Ben-Dor, Lancia, Perone, and Ravi [1997] Fixed k: Gramm, Niedermeier, and Rossmanith [2003] Fixed d: Sojanovic, Berman, Gumucio, Hardison, and Miller [1997] Lanctot, Li, Ma, Wang, and Zhang [1999] Sze, Lu, and Chen [2004]

  9. History: Approximations: PTAS: Li, Ma, and Wang [2002] – not practical. Rounded LP: Ben-Dor, Lancia, Perone, and Ravi [1997] large number of variables: |Σ|l Chimani, Woste, and Bocker [2011]: can be reduced to: |Σ|(l-1) A., Paryenty, and Roditty [2011]: |T(S)| |Σ| (T(S)= set of column types)

  10. Another Motivation – Clustering.The C-CenterStrings problem • Input: • Points in space • Number c • Objective function f. Output: Divide the points to c sets such that for the c consensus strings c1,c2,…,cc, f(c1,c2,…,cc) is maximum/minimum.

  11. Three Types of Objective functions: • Let HRC (Hamming Radius Clustering) be the consensus string problem defined before. • c-HRC: partition into c sets, each of which has center with radius d. • 2. c-HRLC: partition into c sets, each of which has center with radius d, but center is part of input set. • 3. c-HRSC: partition into c sets, each of which has a center and the sum of the radii does not exceed d.

  12. The Hamming radius c-clustering problem (c-HRC) Example: For the following strings and d=1, we show it belongs to 2-HRC.

  13. The Hamming radius local c-clustering problem (c-HRLC) Example: For the following strings and d=2, we show it belongs to 2-HRLC. Does it belong to 2-HRLC when d=1 ?

  14. The Hamming radius c-clustering sum problem (c-HRSC) Example: For the following strings and d=2, we show it belongs to 2-HRC.

  15. In this Paper: We consider: Parametetrized Complexity, and Approximations Small k is not too meaningful in the context of clustering.

  16. C-CenterString Parameterized Complexity

  17. Theorem: HRC,HRLC and HRSC can be solved in polynomial time for fixed k. • If k≤c then input strings can be assigned to c centers where d=0. • Otherwise c<k. There are ck<kk options for partitioning k strings to c sets. • - For each set, find the consensus center in • polynomial time. • - The partition that gives the best result is the optimal solution.

  18. C-CenterString Parameterized Complexity

  19. Theorem: HRC is NP complete even if the radius is fixed to d = 1. • d = 1 and the alphabet is binary • By reduction from Vertex Cover For Triangle-Free Graphs • Our input: • G - Triangle-Free Graph • t – size of vertex-cover set

  20. The construction: The c parameter is t. The distance parameter d is 1. 1 2 Encode edges as bit strings of length |V|. Set the bits of the vertices on the two sides of the edge. 3 4 5 6 7

  21. 1 2 3 4 5 6 7

  22. 1 2 3

  23. C-CenterString Parameterized Complexity

  24. Theorem: HRLC is NP complete even if the length is fixed to l=2 • We prove by reduction from Minimum Maximal Matching for Bipartite graphs • Our input: • G – Bipartite Graph • t – size of the minimal set that is maximal matching Minimum Maximal Matching Maximal Matching

  25. The construction: The c parameter is t. The distance parameter d is 1. 1 2 3 4 5

  26. 1 2 3 4 5

  27. Move strings [6,2] and [5,2] if there are centers begins in 5 or 6 Change the center to one of the remaining strings We keep going until there are no two centers with common symbol !

  28. Approximation Algorithms • 1. A linear-time 4-Approximation for the 2-HRSC problem. • 2. A polynomial time 3-Approximation for the 2-HRSC problem. • 3. Special case PTAS – by computing the clusters and doing 1-HRC approximation on each cluster.

  29. Lemma >2d >2d >2d

  30. Proof center

  31. If we had a representative from each cluster we can associate the rest of the strings to the appropriate group • Now use a knownapproximation algorithmof 1-HRC, for finding the consensus strings of each cluster >2d >2d >2d

  32. Lemma >4d Cluster c-center Cluster c-center

  33. Proof ≤d ≤d ≤d ≤d ≤d

  34. Polynomial time approximation algorithm for 2-HRSC problem

  35. Future work • We presented a heuristic algorithm that did very well in practice – what is its approximation ratio? • 2. There are some gaps in the parameterized complexity • table: • a. What happens in the HRLC/HRSC cases for fixed d? • b. What happens in the HRC/HRSC cases for fixed l? • 3. Is there a PTAS for c-HRC? • 4. Can we approximate c-HRC using LP? SDP?

More Related