230 likes | 354 Views
Better Approximations for the Minimum Common Integer Partition Problem. David Woodruff. MIT and Tsinghua University. Approx 2006. Minimum Common Integer Partition. X = {x 1 , …, x r }, Y = {y 1 , …, y s } are multisets of positive integers. r ¸ s
E N D
Better Approximations for the Minimum Common Integer Partition Problem David Woodruff MIT and Tsinghua University Approx 2006
Minimum Common Integer Partition • X = {x1, …, xr}, Y = {y1, …, ys} are multisets of positive integers. r ¸ s • Consider a partition of X into s subsets B1, …, Bs • If there exist B1, …, Bs with b 2 Bi b = yi for all i, then X is an integer partition of Y. Think of X as a refinement of Y • k-MCIP problem: Given Y1, …, Yk, find a smallest integer partition X of each of Y1, …, Yk • Let m = i=1k |Yi|. Efficiency in terms of m.
MCIP Example • Y1 = {2, 2, 3}, Y2 = {1, 1, 5} • Claim: {1, 1, 2, 3} = k-MCIP(Y1, Y2) • Proof: Partition 1: {1, 1}, {2}, {3} • Partition 2: {1}, {1}, {2, 3} • {1, 1, 2, 3} is an integer partition of Y1 and Y2 • Any integer partition of both Y1, Y2 has size ¸ 4
Applications AA-AA-AAAA-AAA AAA-AAAAA-AA-A {2,2,4,3} {3,5,2,1} MCIP = {2, 3, 1, 2, 3} Since |MCIP| small, humans and monkeys are similar (this measure has been proposed in practice [Jiang, et al])
Applications AA-AA-AAAA-AAA A-A-A-A-AA-A-AA-A-A {2,2,4,3} {1,1,1,1,2,1,2,1,1} MCIP = {1, 1, 1, 1, 1, 1, 1, 2, 2} Since |MCIP| large, humans and mice are not similar
Applications • DNA fingerprint assembly • Oligonucleotide Fingerprinting Ribosomal Genes Project [Valinsky, et al] • Goal is to identify microbial organisms • Use MCIP as a subroutine, k ¼ 28, m ¼ 212 [Jiang] • Clustering? Scheduling?
Previous Work k-MCIP problem: Given Y1, …, Yk, find a smallest integer partition of each of Y1, …, Yk • [CLLJ] NP-hard • (Maximum Set Packing) • APX-hard for every k ¸ 2 • (Maximum-3-Dimensional Matching with Bounded Degree)
Previous Work • [CLLJ] Upper Bounds • (5/4)-approximation for k = 2 • Problem:(m9) running time • (m ¼ 212 in practice) • (k-1/3)-approximation in general • Problems: • (1) Large ratio • (2) Unknown if there is a tight instance
Our Contributions • .614k + o(k) approximation • O(m log k) time • Extremely easy to implement • If Y1, …, Yk are disjoint, then (k+1)/2 approximation • We show that the [CLLJ] k-1/3 approximation algorithm is actually a k-1/2 approximation, and this is tight
Algorithm Overview • Let A be an algorithm for 2-MCIP. We build an algorithm B for k-MCIP • Choose a random set partition of {1, …, k} into pairs of integers • For each pair (i,j) 2, let Ai,j = A(Yi, Yj) • If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
2-MCIP Algorithm • What is the algorithm for 2-MCIP? • Greedy algorithm Output 2 2 1 4 3 Y1: 3 1 2 3 3 0 5 2 1 Y2: Generalization: Greedy(Y1, …, Yk) ·i=1k |Yi| = m Subtract the minimum from both integers and append it to the output Remove all 0s Choose two integers Take the minimum |Greedy(Y1, Y2)| < |Y1| + |Y2| Repeat
Better 2-MCIP Algorithm • CommonElements algorithm for 2-MCIP of Y1, Y2: • T Ã;. While there is a common integer x of Y1 and Y2, T Ã T [ x Y1Ã Y1n x Y2Ã Y2n x • Output T [ Greedy(Y1, Y2) • Let c1,2 be the # of common integers of Y1 and Y2 • |CommonElements(Y1, Y2)| · (|Y1| + |Y2| - 2c1,2) + c1,2 = |Y1| + |Y2| - c1,2
Algorithm Recap • Choose a random set partition of {1, …, k} into pairs of integers • For each pair (i,j) 2, let Ai,j = CommonElements(Yi, Yj) • If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
Analysis • Lower bound the output size of our algorithm as a function of the frequency of different integers • Find the expected output size as a function of the frequency of different integers • Divide these two to get a worst-case (expected) ratio • Derandomize using conditional expectations
Frequency of Integers Define the r-redundancy Red(r) to capture integer frequencies Y1 1 4 3 1 1 Y2 5 2 1 1 1 Y3 2 3 1 3 1 Consider r disjoint multisets A1, …, Ar such that 1. Each Ai intersects at most one input multiset 2. Ai only contains 1 distinct integer Red(r) is maxA1, …, Ari=1r |Ai|
Lower Bound Opt is the size of k-MCIP Elements of Y1 , Y2, …, Yk Elements of k-MCIP 5 2 A left vertex is joined to elements partitioning it There are opt right vertices each of degree k 3 # degree-1 vertices on the left is · Red(opt). So, # edges is ¸ 1¢Red(opt) + 2¢(m – Red(opt)). But, # edges is exactly k¢opt. So, k ¢ opt ¸ 2m – Red(opt)
Example • Our bound is k ¢ opt ¸ 2m – Red(opt) • If input multisets are disjoint, Red(opt)=opt • Trivial greedy algorithm has output size · m • So greedy algorithm is a m/opt = (k+1)/2 approximation
Algorithm Recap • Choose a random set partition of {1, …, k} into pairs of integers • For each pair (i,j) 2, let Ai,j = CommonElements(Yi, Yj) • If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
Upper Bound • In some recursive call on multisets Ya and Yb, we are interested in the number of common elements of Ya, Yb • Since we choose a random partition of input multisets, we can bound the expected number of common elements as a function of Red(opt) • Linearity of expectations and some calculus allows us to bound the expected number of common elements encountered over all recursive calls, in terms of Red(opt) • Use lower bound in terms of Red(opt) to get overall ratio
Upper Bound • Each of O(log k) recursive calls can be implemented in O(m) time, so O(m log k) time • Actually, proof shows that only 3 recursive calls are necessary to get .614k + o(k) approximation • This allows derandomization using conditional expectations in O(m poly(k)) time
Conclusions and Future Work • .614k + o(k) approximation in O(m log k) time • Improve analysis of previous best algorithm, showing it has ratio exactly k-1/2. • Upper bound uses our notion of redundancy • Lower bound uses an adversarial argument • Best known lower bound is (1), so there is a huge gap.
Another Example • Consider algorithm which repeatedly removes an integer common to all k input multisets, and then runs a greedy algorithm on the remaining multisets [CLLJ06] • Suppose r common integers are removed. Then output size · (m-rk) + r • But Red(opt) · rk + (opt – r)(k-1). • Our bound is k ¢ opt ¸ 2m – Red(opt) • This implies opt ¸ (2m-r)/(2k-1), and (m-rk+r)/opt · k – ½. • Using an adversarial argument, can show this is tight