1 / 43

Distributed (Local) Monotonicity Reconstruction

Distributed (Local) Monotonicity Reconstruction. Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden). Overview. Introduce a new class of algorithmic problems: Distributed Property Reconstruction (extending framework of program self-correction,

evita
Download Presentation

Distributed (Local) Monotonicity Reconstruction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed (Local) Monotonicity Reconstruction Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden)

  2. Overview • Introduce a new class of algorithmic problems: Distributed Property Reconstruction (extending framework of program self-correction, robust property testing locally decodable codes) • A solution for the property Monotonicity

  3. Data Sets Data set = function f : Γ V Γ = finite index set V = value set In this talk, Γ = [n]d= {1,…,n}d V = nonnegative integers f = d-dimensional array of nonnegative integers

  4. Distance between two data sets For f,g with common domain Γ: dist(f,g) = fraction of domain where f(x) ≠ g(x)

  5. Properties of data sets Focus of this talk: • Monotone: nondecreasing along every line (Order preserving) When d=1, monotone = sorted

  6. Some Algorithmic problems for P Given data set f (as input): • Recognition: Does f satisfy P? • RobustTesting: (Define ε(f)= min{ dist(f,g) : g satisfies P}) For some 0 ≤ ε1 < ε2 < 1, output either • ε(f) > ε1 :fis far fromP • ε(f) <ε2: fis close toP (If ε1 < ε(f) ≤ ε2 then can decide either)

  7. Property Reconstruction Setting: Given f • We expectf to satisfy P (e.g. we run algorithms on f that assume P) • but f may not satisfy P Reconstruction problem for P: Given data set f, produce data set g that • satisfiesP • is close tof: d(f,g) is not much bigger than ε(f)

  8. What does it mean to produce g? • Offline computation Input: function table for f Output: function table for g

  9. Distributed monotonicity reconstruction Want algorithm A that on input x, computes g(x) • may query f(y) for any y • has access to a short random strings and is otherwise deterministic.

  10. Distributed Property Reconstruction Goal: WHP (with probability close to 1) (over choices of random string s): • g has property P • d(g,f) = O( ε(f)) • Each A(x) runs quickly in particular only reads f(y) for a small number of y.

  11. Distributed Property Reconstruction Precursors: • Online Data Reconstruction Model (Ailon-Chazelle-Liu-Seshadhri)[ACCL] • Locally Decodable Codes and Program self-correction(Blum-Luby-Rubinfeld; Rubinfeld-Sudan; etc ) • Graph Coloring (Goldreich-Goldwasser-Ron) • Monotonicity Testing (Dodis-Goldreich- Lehman-Raskhodnikova-Ron-Samorodnitsky; Goldreich-Goldwasser- Lehman-Ron-Samorodnitsky;Fischer;Fischer-Lehman-Newman-Raskhodnikova-Rubinfeld-Samorodnitsky;Ergun-Kannan-Kumar-Rubinfeld-Vishwanathan; etc) • Tolerant Property Testing(Parnas, Ron, Rubinfeld)

  12. Example: Local Decoding of Codes Data set f = boolean string of length n Property = is a Code word of a given error correcting codeC Reconstruction = Decoding to a close code word Distributed reconstruction = Local decoding

  13. Key issue: making answers consistent • For error correcting code, can assume • input fdecodes to a uniqueg. • The set of positions that need to be corrected is determined by f. • For general property, • many differentg (even exponentially many) that are close to f may have the property • We want to ensure that A produces one of them.

  14. An example • Monotonicity with input array: 1,….,100, 111,…,120,101,…,110,121,…,200.

  15. Monotonicity Reconstruction: d=1 f is a linear array of length n First attempt at distributed reconstruction: A(x) looks at f(x)and f(x-1) If f(x)≥ f(x-1), then g(x) = f(x) Otherwise, we have a non-monotonicity g(x) = max { f(x), f(x-1) }

  16. Monotonicity Reconstruction: d=1 Second attempt Set g(x) = max{ f(1), f(2),…, f(x) } g is monotone but • A(x) requires time Ω(x) • dist(g,f) may be much larger than ε(f)

  17. Our results (for general d ) A distributed monotonicity reconstruction algorithm for general dimensiond such that: • Time to compute g(x) is (log n)O(d) • dist(f,g) = C1(D) (f) • Shared random string s has size (d log n)O(1) (Builds on prior results on monotonicity testing and online monotonicity reconstruction.)

  18. Which array values should be changed? A subset S of Γ is f-monotone if f restricted to S is monotone. For each x in Γ, A(x) must: • Decide whether g(x) = f(x) • If not , then determine g(x) Preserved = { x : g(x) = f(x) } Corrected = { x: g(x) ≠ f(x) } In particular, Preserved must be f-monotone

  19. Identifying Preserved • The partition (Preserved, Corrected) must satisfy: • Preserved is f-monotone • |Corrected|/|Γ| = O(ε(f)) Preliminary algorithmic problem:

  20. Classification problem • Classify each y in Γ as Green or Red • Green is f - monotone • Red has size O(ε(f)|Γ|) Need subroutine Classify(y).

  21. A sufficient condition for f-monotonicity A pair (x,y) in Γ× Γ is a violation if x < y and f(x) > f(y) To guarantee that Green is f - monotone: Red should hit all violations: For every violation (x,y) at least one of x,y is Red

  22. Classify: 1-dimensional case d=1: Γ={1,…,n} f is a linear array. For x in Γ, and subinterval J of Γ: violations(x,J)=|{y in J : (x,y) is a violation}|

  23. Constructing a large f-monotone set The set Bad: x in Bad if for some interval J containing x |violations(x,J)|≥|J|/2 Lemma. 1)Good=Γ - Bad is f-monotone 2)|Bad| ≤ 4 ε(f)|Γ| . Proof: • If x,y are a violation then one of them is Bad for the interval [x,y].

  24. Lemma. • Good=Γ \ Bad is f-monotone • |Bad| ≤ 4 ε(f)|Γ| . So we’d like to take: Green=GoodRed = Bad

  25. How do we compute Good? • To test whether y in Good: For each interval J containing y, check violations(y,J)< |J|/2 Difficulties • There are (n) intervals J containing y • For each J, computing violations(y,J) takes time (|J|) .

  26. Speeding up the computation • Estimate violations(y,J) by random sampling sample size polylog(n) is sufficient violations* (y,J) denotes the estimate • Compute violations* (y,J) only for a carefully chosen set of test intervals

  27. The Test Set T Assume n=|Γ|=2k k layers of intervals Layer j consists of 2k-j+1-1 intervals of size 2j

  28. Subroutine classify To classify y If for each J in T containing y violations*(y,J) < .1 |J| then y isGreen else y isRed

  29. Where are we? We have a subroutine Classify • On input x, • Classify outputs Green or Red • Runs in time polylog(n) • WHP • Green is f-monotone • |Red| ≤ 20ε(f)|Γ|

  30. Defining g(x) for Red x • The natural way to define g(x) is: Green(x) = { y : y≤x and yGreen} g(x) = max{f(y) : y inGreen(x))} = f(max{Green(x)}) • In particular, this gives g(x) = f(x) for Green x

  31. Computing m(x) Can search back from x to find first Green Inefficient if x is preceded by a long Red stretch m(x) x

  32. Approximating m(x)? x m*(x) Pick random Sample(x) of points less than x • Density inversely proportional to distance from x • Size is polylog(n) Green*(x) = { y: y in Sample(x) , yGreen} m*(x) = max {y in Green* (x)}

  33. x m*(x) Is m*(x) good enough? y • Suppose y is Green and m*(x)≤ y ≤x • Since y is Green: g(y) = f(y) and g(x) = f(m*(x)) < f(y) = g(y) • g is not monotone

  34. x y m*(x) Is m*(x) good enough? • To ensure monotonicity we need: x < y implies m*(x) < m*(y) • Requires relaxing the requirement: for all Greenz, m*(z) = z

  35. Thinning out Green* (x) Plan: Eliminate certain unsafe points from Green*(x) Roughly, y is unsafefor x if for some z > x Some interval beginning with y and containing x has a high density of Reds. (There is a non-trivial chance that Sample(z) has no Green points ≥ y.) y x z

  36. Thinning out Green* (x) Green*(x) = { y: y in Sample(x) , yGreen} m*(x) = max {y in Green* (x)} Green^(x) = { y: y in Green* (x) , y safe for x} m^(x) = max {y in Green^ (x)} (Hiding: Efficient implementation of Green^(x))

  37. Redefining Green^(x) WHP • if x≤ y, then m^(x) ≤ m^(y) • {x: m^(x) ≠ x} is O(ε(f) |Γ|).

  38. Summary of 1-dimensional case • Classify points as Green and Red • Few Red points • f restricted to Green is f-monotone • For each x, choose Sample(x) • size polylog(n) • All points less than x • Density inversely proportional to distance from x • Green^ (x) from Sample(x) that aresafe for x • m^(x) is the maximum ofGreen^(x) • Output g(x)=f(m^(x))

  39. Dimension greater than 1 For x < y, want g(x) < g(y) y x

  40. Red/Green Classification • Extend the Red/Green classification to higher dimensions: • f restricted to Green is Monotone • Red is small Straightforward (mostly) extension of 1-dimensional case

  41. Given Red/Green classification In the one-dimensional case, Green^(x) = sampledGreenpoints safe for x g(x) = f(max {y : y in Green^ (x) } .

  42. The Green points below x x 0 1 • Set of Green maxima could be very large • Sparse Random Sampling will only roughly capture the frontier • Finding an appropriate definition of unsafe points is much harder than in the one dimensional case

  43. Further work • The g produced by our algorithm has d(g,f)≤ C(d)ε(f)|Γ| • Our C(d) is exp(d2) . • What should C(d) be? (Guess: C(d) = exp(d) ) • Distributed reconstruction for other interesting properties? • (Reconstructing expanders, Kale,Peres, Seshadhri, FOCS 08)

More Related