220 likes | 229 Views
The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences. Xiaoming Sun Tsinghua University David Woodruff MIT. The Problem. Stream of elements a 1 , …, a n 2 Algorithm given one pass over stream
E N D
The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences Xiaoming Sun Tsinghua University David Woodruff MIT
The Problem • Stream of elements a1, …, an2 • Algorithm given one pass over stream • Problem: Compute the longest increasing subsequence (LIS) – in this case answer is (3,7) 4 3 7 3 1 1 0
Previous Work • Let k be the length of the LIS of the stream • There exists an algorithm which computes the LIS with O(k2 log ||) space [LNVZ05] • Trivial (k) lower bound • Our first result: Improve both bounds to a tight (k2 log ||/k)
Our Lower Bound Reduction from indexing function: Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Randomized 1-way communication is (n)
Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Construct a stream A Construct a stream B • From LIS(A, B), Bob can get xi • 2. |LIS(A, B)| = k, where k is input parameter
Alice Ak-1 Value … x 2 {0,1}n A: A2 A1 Position in stream Alice uses x to create k-1 increasing sequences A1, …, Ak-1For each j, Aj has length j. Each bit of x is encoded in some sequence Aj Every element in Ak-1 is larger than every element in Ak-2, every element in Ak-2 larger than every element in Ak-3, etc. Set A = Ak-1 ,…, A2 , A1
Aj+1 Value B Aj B: Aj-1 Position in stream Bob i 2 [n] Bob uses i to recover Aj, the sequence encoding xi Bob creates an increasing subsequence B of length k-j, Every element in B is greater than Arif r < j, and every element in B is less than Arif r > j
Alice Bob What is xi? Aj+1 i 2 [n] x 2 {0,1}n Value B Aj Aj-1 B A = Ak-1, …, A2, A1 Position in stream LIS(A, B) = Aj, B, and |LIS(A, B)| = k But xi encoded in Aj, so Bob recovers xi
Thus, any streaming algorithm must use (n) space. • But what is n? We need to construct k increasing sequences that are different for different x in {0,1}n • Assume || large. Divide into k-1 blocks of size ||/(k-1) • Let Aj be a random increasing sequence of length j in block j. • The space to represent Aj is (k log ||/k) for j > k/2 • Set n = (k2 log ||/k).
Our Upper Bound • When processing the stream, keep lists A[1], A[2], …, A[k]. • A[j] is an LIS of length j in the stream with minimal last element. • Let L[1], L[2], …, L[k] be last elements of A[1], A[2], …, A[k] • To process item x,find i for which L[i] < x < L[i+1], and replace A[i+1] with A[i], x
So we have k arrays A[1], …, A[k], each of length at most k. • Naively, this takes O(k2 log ||) space. • But the Ai are increasing, so can compress the list by storing differences. • Total space is O(k2 log ||/k).
This talk • First result: a tight space bound for the LIS problem • Second result: tight bounds for longest common subsequence (LCS)
LCS Bounds • Problem: Alice has a permutation of [N], Bob has a permutation of [N]. Decide if |LCS(, )| ¸ k. • Previous space bound: (k) [LNVZ05] • Our space bound: (N) for 3 · k · N/2 (holds for randomized O(1)-pass algorithms)
LCS Bounds • Why can we only prove (N) for 3 · k · N/2? • If k = 2, reduces to equality test. • If k large, there are at most O(N2(N-k)) permutations with |LCS(, )| > k, so just use an equality test with error O(1/N2(N-k))
Our Lower Bound • Padding lemma: if for k = 3 the randomized communication complexity is (N), then it’s (N) for all k · N/2 • Proof: just pad each of the inputs by some common subsequence of length k-3
Remains to show high complexity for k =3. We reduce from disjointness Is there an i such that xi = yi = 1? Alice Bob x 2 {0,1}n y 2 {0,1}n Randomized multi-way communication is (n)
Is there an i such that xi = yi = 1? Alice Bob y 2 {0,1}N/3 x 2 {0,1}N/3 Construct Construct Want |LCS(, )| ¸ 3 iff x and y are disjoint
Alice = 1, 2, …, N/3 x 2 {0,1}N/3 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use x to choose 1, …, N/3 iacts on Gi If xi = 0, i (m+1, m+2, m+3) = (m+1, m+2, m+3). If xi = 1, i (m+1, m+2, m+3) = (m+1, m+3, m+2).
Bob y 2 {0,1}N/3 = N/3 , …, 1 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use y to choose 1, …, N/3 iacts on Gi If yi = 0, i (m+1, m+2, m+3) = (m+3, m+2, m+1). If yi = 1, I (m+1, m+2, m+3) = (m+1, m+3, m+2).
N/3(GN/3) N/3(GN/3) … … 3(G3) 3(G3) 2(G2) 2(G2) 1(G1) 1(G1) Claim: |LCS(, )| · 3. Proof: Use the fact that LCS(, ) intersects at most one Gi Claim: |LCS(, )| = 3 iff there is some i with xi = yi = 1 Proof: Use the way we defined i and i Thus, can decide disjointness, so (N) communication.
Other results • Tight space bounds for computing the LIS length. • Generalization to approximate LIS and LCS. Still many gaps here. • Example: approximate LIS length, we have (1/) and O(k log ||). Recent work [GJKK07] has shown O(sqrt(N/) log ||), but still large gap.
Conclusion • First result: a tight bound for the LIS • Second result: an (N) space bound for the LCS k-decision problem for 3 · k · N/2 • Other results for approximation problems • Another open question: extend our lower bound for LIS to randomized multi-round