410 likes | 598 Views
Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence. Anna Gal UT Austin Parikshit Gopalan U. Washington & UT Austin. Storage. Data Stream Model of Computation. X 1 X 2 X 3 … X n. Input. Single pass.
E N D
Lower Bounds on Streaming Algorithms for Approximating the Length of theLongest Increasing Subsequence. Anna Gal UT Austin Parikshit Gopalan U. Washington & UT Austin
Storage Data Stream Model of Computation X1 X2 X3 … Xn Input • Single pass. • Small storage space, update time. • Surprisingly powerful [Alon-Matias-Szegedy, …]
Estimated Sortedness on Data-Streams Cannot sort efficiently. Can we tell if the data needs to be sorted? • [Ajtai-Jayram-Kumar-Sivakumar, Gupta-Zane, • Cormode-Muthukrishnan-Sahinalp, LibenNowell-Vee-Zhu, • Woodruff-Sun,G.-Jayram-Kumar-Sivakumar] • Measuring Sortedness: • Length of Longest Increasing Subsequence. • Ulam/Edit distance • Inversion/Kendall Tau distance
Longest Increasing Subsequence LIS(): Length of Longest Increasing Subsequence. 5 7 8 1 4 2 10 3 6 9
Longest Increasing Subsequence LIS(): Length of Longest Increasing Subsequence. 5 7 8142103 6 9 Studied in statistics, biology, computer science … [Gusfeld, Pevzner, Aldous-Diaconis…]
Prior Work • Exact Computation of LIS() : • Patience Sorting [Ross,Mallows] O(n) space, 1-pass streaming algorithm. • (n) space lower bound. [G.-Jayram-Krauthgamer-Kumar’07, Woodruff-Sun’07] • Approximating LIS() : • Deterministic, O(n/)1/2space, (1 + )-approx. [G.-Jayram-Krauthgamer-Kumar’07] Conjecture [GJKK]: Every 1-pass deterministic algorithm that gives a 1.1-approximation toLIS() requires (√n) space.
Our Results Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/). • Tight bounds in n, . • Proof via direct sum approach. • Direct sum for maximum communication in the private messages model. • Separation between communication models.
A Communication Problem Consider the following problem: 1 2 3.2 4.2 1.8 2.9 3.7 4.9 1.6 2.8 3.5 4.6 • t players, t numbers each. • Goal: Approximate length of the LIS. • Enough to show a lower bound of (t) on maximum message size.
A Communication Problem Consider the following problem – P1 P2 … Pt • t players, t numbers each. • Goal: Approximate length of the LIS. • Enough to show a lower bound of (t) on maximum message size.
A Communication Problem [GJKK]: Consider the following decision problem – Yes No P1 P2 … Pt
A Communication Problem [GJKK]: Consider the following decision problem – Yes No P1 P2 … Pt All columns non-increasing
A Communication Problem [GJKK]: Consider the following decision problem – Yes No P1 P2 … Pt All columns non-increasing
A Communication Problem [GJKK]: Consider the following decision problem – Yes No P1 P2 … Pt All columns non-increasing Some column increasing
A Communication Problem [GJKK]: Consider the following decision problem – Yes No P1 P2 … Pt All columns non-increasing Some column increasing
Direct Sum Paradigm Primitive Problem: p(x1, y1) y1 x1
Direct Sum Paradigm Direct Sum Problem: Çi p(xi,yi) y1,…,yn x1,…,xn Can run n copies of protocol for p. Direct-Sum Question: Is this the best possible? Set-Disjointness, Inner Product… Techniques for proving direct-sum theorems: [KN,CKSW,BJKS,SS…]
Primitive Problem Yes No P1 P2 … Pt
Direct Sum of Primitive Problems Yes No P1 P2 … Pt All No instances
Direct Sum of Primitive Problems Yes No P1 P2 … Pt All No instances One Yes instance
Direct Sum of Primitive Problems Yes No P1 P2 … Pt
[GG] An Easier Problem Yes No Hope: Some player distinguishes between many No instances.
BlackBoard Model of One-Way Communication • Players speak in order. • Every message seen by all. • Last player outputs answer.
Problem is Easy in the BlackBoard model No Yes BlackBoard protocol with max. communication 2 log(m).
Problem is Easy in the BlackBoard model No Yes BlackBoard protocol with max. communication 2 log(m).
Private Messages Model • Messages seen by next player only. • Suffices for streaming lower bound. • Requires non-standard techniques.
Private Messages Model Yes No Strong lower bound for maximum communication in the private messages model. Thm: Any det. O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/). Separation between blackboard and private messages.
Proof Outline • Step 1: Primitive Problem (one round). • Step 2: Direct-sum Problem (one-round). • Multi-round Protocols.
Primitive Problem Yes No P1 P2 … Pt Alphabet of size m > t. Yes Case: LIS() > t/2. Easy: Bound of ≈ (log m)/t on max communication. Thm:Max communication is at least log (m/t).
Lower Bound for Primitive Problem a a a a a a…a a…a a…a x1…xi Pis message is specified by prefix x1…xi. Mi(a): Prefixes where Pi sends the same message as a…a. qi(a): Length of longest IS in Mi(a)ending below a.
Lower Bound for Primitive Problem a a a a Mi(a): Inputs where Pi sends the same message as a…a. qi(a): Length of longest IS in Mi(a)ending below a. • Monotone • x1…xi2 Mi(a) ) x1…xia 2 Mi+1(a) • Bounded by t/2 • Correctness. qi(a) i
Lower Bound for Primitive Problem a a a a Mi(a): Inputs where Pi sends the same message as a…a. qi(a): Length of longest IS in Mi(a)ending below a. Map a to first i s.t qi-1(a) = qi(a). Some i occurs m/t times. qi(a) i
Lower Bound for Primitive Problem Pi-1 Pi a…a x1< … < xi-1 = a x1…xi-1 b…b m/t y1< … < yi-1 = b y1…yi-1 c…c z1< … < zi-1 = c z1…zi-1 Claim:Pi-1 must distinguish a…a from b…b from c…c.
Lower Bound for Primitive Problem Pi-1 Pi a…ab a…a x1…xi-1b x1…xi-1 y1…yi-1b y1…yi-1 b…bb b…b x1· … · xi-1 = a · b But qi(b) = i-1. Contradiction. HencePi-1 must distinguish a…a from b…b from c…c. Gives log(m/t) lower bound.
Lower Bound for General Problem a1…at a1…at a1…at a1…at Mi(a1…at):i £ t prefixes where Pi sends the same message as (a1…at)i. qi,j(a1…at): Length of longest IS in column jending at/before aj.
Lower Bound for General Problem a1…at a1…at a1…at a1…at Mi(a1…at):i £ t prefixes where Pi sends the same message as (a1…at)i. qi,j(a1…at): Length of longest IS in column jending at/before aj. ... qi,t(a) qi,1(a)
Lower Bound for General Problem a1…at a1…at a1…at a1…at Mi(a1…at):i £ t prefixes where Pi sends the same message as (a1…at)i. qi,j(a1…at): Length of longest IS in column jending at/before aj. ... qi,t(a) qi,1(a)
Lower Bound for General Problem a1…at a1…at Part II: Show that Pi-1 distinguishes between inputs in I of ≈(m/t)t inputs. Gives a lower bound of log(|I|) ≈ t log (m/t)
Lower Bound for Many Rounds a1…at a1…at a1…at a1…at Part I: Messages sent by Pi in round 2 and beyond depend on entire input. Need to change defn. of Mi(a1…at).
Lower Bound for Many Rounds a1…at a1…at Part I: Messages sent by Pi in round 2 and beyond depend on entire input. Need to change defn. of Mi(a1…at). Part II: Reduce to 2-player protocol involving Pi-1 and Pt. Thm: Any deterministic O(1)-pass algorithm that gives a (1 + ) approximation to the LIS requires space √(n/).
Conclusions • Exact Computation of LIS() : • Patience Sorting [Ross,Mallows] • O(n) space, 1-pass streaming algorithm. • (n) space lower bound. [G.-Jayram-Krauthgamer-Kumar, Woodruff-Sun] • Approximating LIS() : • O(n/)1/2space, deterministic 1-pass algorithm. [G.-Jayram-Krauthgamer-Kumar] • This paper: The bound is tight for deterministic, O(1)-pass algorithms. • [Ergun-Jowhari’08]: Different proof.
Randomized Complexity of LIS Problem: Is the a randomized streaming algorithm to approximate the LIS using space o(√n)? • [Woodruff-Sun] O(log m) lower bound • [Chakrabarti]: Randomized private-messages protocol for the direct-sum problem. Thank You!