1 / 45

Estimating the longest increasing sequence in polylogarithmic time

Estimating the longest increasing sequence in polylogarithmic time. C. Seshadhri ( Sandia National Labs) Joint work with Michael Saks ( Rutgers University ).

edda
Download Presentation

Estimating the longest increasing sequence in polylogarithmic time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating the longest increasing sequence in polylogarithmic time C. Seshadhri (Sandia National Labs) Joint work with Michael Saks (Rutgers University) Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000

  2. The problem 4 4 24 10 10 9 15 15 17 17 20 18 18 4 19 19 3 • Given array f:[n] → N, find (length of) Longest Increasing Subsequence (LIS) • Rather self-explanatory • By now, textbook dynamic programming problem • [CLRS 01] Chapter 15.4 (Longest Common Subsequence), Starred Problem 15.4-6 • [Schensted 61, Fredman 75] O(n log n) algorithm

  3. Too much to read |LIS| is in range [0.4n, 0.6n] • Array f is extremely large, so can’t read all of it • What can we say about LIS length, if we see very little? • |LIS| = LIS length • Read only poly(log n) positions • Obviously randomized Algorithm 5 7 4 8 9 2

  4. Uniform sample says nothing • Choose uniform random sample of poly(log n) size • |LIS| = n/2, but random sample always increasing • So not really that easy to learn about |LIS|… 2 1 4 4 3 6 5 8 7 10 9 9

  5. Our result |LIS| in this range Algorithm 1 n • We want range to be small |LIS|

  6. |LIS| in this range Our result δn Algorithm 1 n • We want range to be small • [This work] For any (constant) δ > 0 Algorithm gives additive δn approximation to |LIS| Running time is (1/δ)1/δ(log n)c |LIS|

  7. Ad alert! Our result δn 1 n/2 n • We want range to be small • [This work] For any (constant) δ > 0 Algorithm gives additive δn approximation to |LIS| Running time is (1/δ)1/δ(log n)c • [Ailon Chazelle Liu S 03] [Parnas Ron Rubinfeld 03] Previous best: δ = ½ |LIS|

  8. Ad alert! Our result δn 1 n/2 n • We want range to be small • [This work] For any (constant) δ > 0 Algorithm gives additive δn approximation to |LIS| Running time is (1/δ)1/δ(log n)c • We get (1+δ)-approx to distance to monotonicity • Previously best was factor 2 |LIS|

  9. Prelims: the array in space 20 15 10 4 4 20 10 10 9 15 15 4 1 2 3

  10. Prelims: the array in space • Input is points in plane, given as array • (LIS is longest chain in partial order) Violation Increasing sequence

  11. A hard example k k 10 points in each • The decision for a point depends on small scale properties of “far away” portions k k |LIS| = 4k |LIS| = 2k 3k k k 3k

  12. A hard example k k • Random samples in neighborhoods of points are identical! • “Can we really estimate LIS in polylog time?” • Is it time for some heavy work? • I mean, time for lbs (lower bounds). k k 3k k k 3k

  13. Outline (or lack thereof) • Will I show proofs? • No • Will I show the algorithm? • Maybe • I will try to demonstrate the main insight • By a series of thought experiments

  14. The dynamic program Closest LIS point to left • Closest LIS point to left gives “splitter” • Find LIS is each blue region. Piece together! • So we break up original problem into subproblems Splitter n/2

  15. The dynamic program • But we don’t know right splitter. • So try all possible! Only n different choices • Choose the one that gives the largest sum of LIS’s • MaxS (|LIS-below-S| + |LIS-above-S|) S n/2

  16. The dynamic program • If you LIS in all small boxes, you can build LIS for bigger boxes • Not the most efficient DP… • So our sublinear algo will mimic this process n/2

  17. The IP Is this point on LIS? LIS is in blue region Splitter n/2 Where is the splitter? It is there.

  18. The IP This point NOT on LIS LIS is in blue region n/2 Where is the splitter? It is there.

  19. The IP n/2 3n/4 I wish we knew the splitter in that region It is there.

  20. The IP n/2 3n/4 5n/8 I think I know what will happen next You’re lucky I’m here

  21. The IP n/2 3n/4 5n/8 I think I know what will happen next You’re lucky I’m here

  22. The IP n/2 3n/4 5n/8 I think I know what will happen next You’re lucky I’m here

  23. The interactive protocol • If point stays in blue region till very end, then it is good (on LIS). Otherwise, bad. • This takes (log n) steps, with the help of the wizard

  24. The interactive protocol • If point stays in blue region till very end, then it is good (on LIS). Otherwise, bad. • This takes (log n) steps, with the help of the wizard • If we could simulate the wizard…

  25. The interactive protocol • If point stays in blue region till very end, then it is good (on LIS). Otherwise, bad. • This takes (log n) steps, with the help of the wizard • If we could simulate the wizard… What?? If you could simulate the wizard, you know the LIS!

  26. Find a splitter If very few LIS points outside blue, this is not a bad splitter • Finding splitter may be hard, so try for approximate versions…? • But how do we determine the number of LIS points? n/2

  27. Find a splitter Total no. of points outside blue< μn • If μ < 1/(100 log n), being against health care conservative is good enough Conservative splitter n/2

  28. Easy to check • Count fraction of sample outside blue • poly(log n) samples checks this accurately n/2

  29. Getting a conservative splitter • We can sample (log n) different candidates and check which of them disbelieves evolution is conservative • What if no conservative splitter exists? n/2

  30. A liberal paradise No. of points outside at least μn Choose any line • So we know that |LIS| < (1-μ) n • Leads to the next idea. Boosting approximations! • Given δ-approx to LIS, can we get improve to δ’? n/2

  31. Boosting approximations Run δn-approx on points in box No. of points outside at least μn • Take sum of outputs as total LIS estimate • |LIS| = |LIS1| + |LIS2|, Est = Est1 + Est2 • |Est1 – LIS1| < δn1 |Est2 – LIS2| < δn2 • So |Est – LIS| < δ(n1 + n2) • n1+n2 < (1-μ)n, so |Est – LIS| < δ(1-μ)n ! n2 Run δn-approx on points in box Real splitter n1 n/2

  32. Putting it together Conservative splitter? • Check if each is conservative splitter • If it is, we’re found right subproblems • Otherwise… n/2

  33. Putting it together Run δn-approx on points in box • One of these is “close enough” to real splitter • Est(S) = Left-Est(S) + Right-Est(S) S Run δn-approx on points in box n/2

  34. Putting it together Run δn-approx on points in box • One of these is “close enough” to real splitter • Est(S) = Left-Est(S) + Right-Est(S) • Final Estimate = maxS Est(S) • Looks like a great idea! • We go from δnto δ(1- μ)n.Recur to keep improving approximation S Run δn-approx on points in box n/2

  35. It fails, miserably Alg δ0=δ1(1-μ) • As we go up each level, approx gets better by (1-μ). • So to get δ0 = ¼, how many levels needed? • ¼ = ½ (1-μ)tSo t = 1/μ • We have running time at least 21/μ. So, μ needs to be > 1/log log n. Alg Alg δ1 1/μ Alg Alg Alg Alg δ2 ½ Alg Alg Alg Alg

  36. Find a splitter Total no. of points outside blue< μn • If μ < 1/(100 log n),being against health care conservative is good enough Conservative splitter n/2

  37. The basic dichotomy Continue IP P Cannot find splitter We find splitter The “Interactive Protocol” phase The “Dynamic Programming” phase • For IP, we need μ < 1/log n • μn is error in each “level” of IP • For DP, we need μ > 1/log log n • (1-μ) is decrease in approximation

  38. The basic dichotomy Strengthen Weaken Continue IP P Cannot find splitter We find splitter The “Interactive Protocol” phase The “Dynamic Programming” phase • For IP, we need μ < 1/log n • μn is error in each “level” of IP • For DP, we need μ > 1/log log n • (1-μ) is decrease in approximation

  39. Reducing to smaller DP! • Run δ-approx on all poly(log n) such boxes n/(log n) Run δ-approx to get LIS estimate inside box n/(log n)

  40. Reducing to smaller DP! • Run δ-approx on all poly(log n) such boxes • Use Dynamic Program to find chain with largest sum of estimates • Longest path in DAG • Can solve in poly(log n) time Chain n/(log n) n/(log n)

  41. Dichotomy theorem OR One can go from δ-approx to (δ-δ2)-approx by a (log n) sized DP Either it is easy to find the right subproblems

  42. The algorithm, in one slide Continue IP P • Overall running time becomes (log n)1/δ • *&^#$% miracle that the math works out Cannot find splitter We find splitter Make poly(log n) calls to δ-approx. Solve DP of poly(log n) size.

  43. The even better version • Don’t exactly solve this dynamic program! • Use our sublinear algo to approximately solve in (loglog n) time. Then do it recursively… • It’s painful • It’s all Greek:α β γδεζλμξ • We had ν, but got rid of it

  44. What next? • Sublinear dynamic programming! • We get (1/δ)1/δ (log n)c time. Can we get (log n)/δ time? • Would be extremely cool. Completely optimal • Applications for other dynamic programs? • How does one find the “right” subproblems in sublinear time? • Generalize the dichotomy • Longest common subsequence/edit distance…?

  45. Ask and you shall know…

More Related