210 likes | 350 Views
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems. Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith. Outline. Introduction Preliminaries Linear-Time solution for constant d Related Problems Linear-Time solution for fixed k Conclusion.
E N D
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith
Outline • Introduction • Preliminaries • Linear-Time solution for constant d • Related Problems • Linear-Time solution for fixed k • Conclusion
Intro : Problem Definition • Input: String s1, s2, …, sk over alphabet Σ of length L each, and a nonnegative integer d. • Question: Is there a string s of length L such that dH(s, si)≤d for all i=1,…,k • dH(s1, s2) = |{i|s1[i]≠s2[i]}|, |s1|=|s2|
NP-completeness • CLOSEST STRING is NP-complete • d is usually small in biological applications • O(kL+kd*dd) result in this paper • PTAS by Li et al
Extended problems • d-MISMATCH • DISTINGUISHING STRING SELECTION • DISTINGUISHING SUBSTRING SELECTION
Preliminaries • Given a set of string S={s1,…,sk}, each of length L • s is optimal center string iff no s’ such that maxi=1,…,kdH(s’,si)<maxi=1,…,kdH(s,si) • s is optimal median string iff no s’ such that Σi=1,…,kdH(s’,si)<Σi=1,…,kdH(s,si)
Given a set of k strings of length L, think of this string as k x L matrix • Optimal median string : • a c c a
Main idea • Search! • Fixed-parameter tractibility • Reduction to problem kernel
LEMMA 1. Given a set of strings S={s1,…,sk}, each of length L, and a permutationσ:{1,…,L}{1,…,L}. Then s is an optimal center string for {s1,…,sk} iff σ(s) is an optimal center string for {σ(s1), σ(s2), …, σ(sk)}
LEMMA 2. To compute an optimal center string, it is sufficient to solve a normalized and reordered instance. From this, the solution of the original instance can be derived in linear time
LEMMA 3. A CLOSEST STRING instance with arbitrary alphabet Σ, |Σ|>k, isomorphic to a CLOSEST STRING instance with alphabet Σ’, |Σ’|=k. • By normalization
LEMMA 4. Given a CLOSTEST STRING instance s1,…,sk of length L and d. If the resulting k x L matrix has more than kd dirty dirty columns, then there is no string s with maxi=1,…,kdH(s,si)≤d • A column is dirty iff it contains at least two different symbols from alphabet Σ • By pigeon theorem
A Linear-Time solution for constant d • Bounded search tree algorithm • LEMMA 5. Given a set of strings S={s1,…,sk} and a positive integer d. If there are i, j {1,…,k} with dH{si,sj}>2d, then there is no string s with maxi=1,…,kdH(s, si)≤d
Theorem 1. Given a set of string S={s1,…,sk} and d, Algorithm D determines in O(kL+kd*dd) time. • By lemma 4, reduced the input instance to O(kd) in O(kL) time • Depth=d, Time(D0+D1+D2+D3)=kd by building a table containing the distances of candidate s1 to all other given strings
correctness • Show only the correctness of first step • If s1 is not a solution but there exists a center string s • P :={p|s1[p]≠si[p]}, |P|=d+1 • Ps1≠s=si := {p|s1[p]≠s[p]=si[p]} goal! • Ps1≠s=si =Ps≠si∪ P (disjoint), |Ps≠si|≤d • So d+1 subcases is sufficient
Related Problems • d-MISMATCH problem • Si,p,L denote the length L substring of a given string si starting at position p • Whether there is a string of length L and a position p with 1≤p≤n-L+1, such that dH(s,si,p,L)≤d, for all I • Stojanvoic et al give a linear time algorithm fo 1-MISMATCH • Theorem 2. d-MISMATCH is solvable in O(kL+(n-L)kd*dd) time which O(n*k) for fixed d • Naively: O(n*(KL+kd*dd)) • Maintain the queue of dirty columns • Considering only the first L columns, we can build a FIFO queue in O(kL) • Update at each position in O(k) time
DSS problem • DISTINGUISHING STRING SELECTION • Given S={s1,…,sk1}, S’={s’1,…,s’k2} all of the same length L, and d1,d2≥0, is there a s such that • LEMMA 6. Given two set of strings S1={s1,…,sk1} and S2={s’1,…,s’k2} and positive d1,d2. If there are i{1,…,k1} and j{1,…k2} with dH(si,s’j)<L-(d1+d2), then there is no string s satisfying both maxi=1,…,k1dH(s,si)≤d1 and minj=1,…,k2dH(s,s’j)≥L-d2 • dH(s,s’j)≤dH(s,si)+dH(si,s’j)
A Linear-Time Solution for Fixed k • Is CLOSEST STRING fixed parameter tractable? • Use integer linear programming (ILP) • Lenstra: ILP with a fixed number of variables can be solved in linear time(exponential space)
CLOSEST STRING in ILP • Column types for k • For k=3: (a,a,a)t, (a,a,b)t, (a,b,a)t, (b,a,a)t, (a,b,c)t • |column types|=B(k)≤k! • Xt,φ, t: column type, φΣ • Number of column type t whose corresponding character in the desired solution string of CLOSEST STRING is set to φ • B(k)*k Variables needed • Minimize • Φt,i denates the alphabet symbol at the ith entry of column type t
Conclusion • Fixed parameter tractability for CLOSEST STRING in d, k • Improve previous work in d-MISMATCH • DSS • CLOSEST SUBSTRING ?