430 likes | 542 Views
Distributed (Local) Monotonicity Reconstruction. Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden). Overview. Introduce a new class of algorithmic problems: Distributed Property Reconstruction (extending framework of program self-correction,
E N D
Distributed (Local) Monotonicity Reconstruction Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden)
Overview • Introduce a new class of algorithmic problems: Distributed Property Reconstruction (extending framework of program self-correction, robust property testing locally decodable codes) • A solution for the property Monotonicity
Data Sets Data set = function f : Γ V Γ = finite index set V = value set In this talk, Γ = [n]d= {1,…,n}d V = nonnegative integers f = d-dimensional array of nonnegative integers
Distance between two data sets For f,g with common domain Γ: dist(f,g) = fraction of domain where f(x) ≠ g(x)
Properties of data sets Focus of this talk: • Monotone: nondecreasing along every line (Order preserving) When d=1, monotone = sorted
Some Algorithmic problems for P Given data set f (as input): • Recognition: Does f satisfy P? • RobustTesting: (Define ε(f)= min{ dist(f,g) : g satisfies P}) For some 0 ≤ ε1 < ε2 < 1, output either • ε(f) > ε1 :fis far fromP • ε(f) <ε2: fis close toP (If ε1 < ε(f) ≤ ε2 then can decide either)
Property Reconstruction Setting: Given f • We expectf to satisfy P (e.g. we run algorithms on f that assume P) • but f may not satisfy P Reconstruction problem for P: Given data set f, produce data set g that • satisfiesP • is close tof: d(f,g) is not much bigger than ε(f)
What does it mean to produce g? • Offline computation Input: function table for f Output: function table for g
Distributed monotonicity reconstruction Want algorithm A that on input x, computes g(x) • may query f(y) for any y • has access to a short random strings and is otherwise deterministic.
Distributed Property Reconstruction Goal: WHP (with probability close to 1) (over choices of random string s): • g has property P • d(g,f) = O( ε(f)) • Each A(x) runs quickly in particular only reads f(y) for a small number of y.
Distributed Property Reconstruction Precursors: • Online Data Reconstruction Model (Ailon-Chazelle-Liu-Seshadhri)[ACCL] • Locally Decodable Codes and Program self-correction(Blum-Luby-Rubinfeld; Rubinfeld-Sudan; etc ) • Graph Coloring (Goldreich-Goldwasser-Ron) • Monotonicity Testing (Dodis-Goldreich- Lehman-Raskhodnikova-Ron-Samorodnitsky; Goldreich-Goldwasser- Lehman-Ron-Samorodnitsky;Fischer;Fischer-Lehman-Newman-Raskhodnikova-Rubinfeld-Samorodnitsky;Ergun-Kannan-Kumar-Rubinfeld-Vishwanathan; etc) • Tolerant Property Testing(Parnas, Ron, Rubinfeld)
Example: Local Decoding of Codes Data set f = boolean string of length n Property = is a Code word of a given error correcting codeC Reconstruction = Decoding to a close code word Distributed reconstruction = Local decoding
Key issue: making answers consistent • For error correcting code, can assume • input fdecodes to a uniqueg. • The set of positions that need to be corrected is determined by f. • For general property, • many differentg (even exponentially many) that are close to f may have the property • We want to ensure that A produces one of them.
An example • Monotonicity with input array: 1,….,100, 111,…,120,101,…,110,121,…,200.
Monotonicity Reconstruction: d=1 f is a linear array of length n First attempt at distributed reconstruction: A(x) looks at f(x)and f(x-1) If f(x)≥ f(x-1), then g(x) = f(x) Otherwise, we have a non-monotonicity g(x) = max { f(x), f(x-1) }
Monotonicity Reconstruction: d=1 Second attempt Set g(x) = max{ f(1), f(2),…, f(x) } g is monotone but • A(x) requires time Ω(x) • dist(g,f) may be much larger than ε(f)
Our results (for general d ) A distributed monotonicity reconstruction algorithm for general dimensiond such that: • Time to compute g(x) is (log n)O(d) • dist(f,g) = C1(D) (f) • Shared random string s has size (d log n)O(1) (Builds on prior results on monotonicity testing and online monotonicity reconstruction.)
Which array values should be changed? A subset S of Γ is f-monotone if f restricted to S is monotone. For each x in Γ, A(x) must: • Decide whether g(x) = f(x) • If not , then determine g(x) Preserved = { x : g(x) = f(x) } Corrected = { x: g(x) ≠ f(x) } In particular, Preserved must be f-monotone
Identifying Preserved • The partition (Preserved, Corrected) must satisfy: • Preserved is f-monotone • |Corrected|/|Γ| = O(ε(f)) Preliminary algorithmic problem:
Classification problem • Classify each y in Γ as Green or Red • Green is f - monotone • Red has size O(ε(f)|Γ|) Need subroutine Classify(y).
A sufficient condition for f-monotonicity A pair (x,y) in Γ× Γ is a violation if x < y and f(x) > f(y) To guarantee that Green is f - monotone: Red should hit all violations: For every violation (x,y) at least one of x,y is Red
Classify: 1-dimensional case d=1: Γ={1,…,n} f is a linear array. For x in Γ, and subinterval J of Γ: violations(x,J)=|{y in J : (x,y) is a violation}|
Constructing a large f-monotone set The set Bad: x in Bad if for some interval J containing x |violations(x,J)|≥|J|/2 Lemma. 1)Good=Γ - Bad is f-monotone 2)|Bad| ≤ 4 ε(f)|Γ| . Proof: • If x,y are a violation then one of them is Bad for the interval [x,y].
Lemma. • Good=Γ \ Bad is f-monotone • |Bad| ≤ 4 ε(f)|Γ| . So we’d like to take: Green=GoodRed = Bad
How do we compute Good? • To test whether y in Good: For each interval J containing y, check violations(y,J)< |J|/2 Difficulties • There are (n) intervals J containing y • For each J, computing violations(y,J) takes time (|J|) .
Speeding up the computation • Estimate violations(y,J) by random sampling sample size polylog(n) is sufficient violations* (y,J) denotes the estimate • Compute violations* (y,J) only for a carefully chosen set of test intervals
The Test Set T Assume n=|Γ|=2k k layers of intervals Layer j consists of 2k-j+1-1 intervals of size 2j
Subroutine classify To classify y If for each J in T containing y violations*(y,J) < .1 |J| then y isGreen else y isRed
Where are we? We have a subroutine Classify • On input x, • Classify outputs Green or Red • Runs in time polylog(n) • WHP • Green is f-monotone • |Red| ≤ 20ε(f)|Γ|
Defining g(x) for Red x • The natural way to define g(x) is: Green(x) = { y : y≤x and yGreen} g(x) = max{f(y) : y inGreen(x))} = f(max{Green(x)}) • In particular, this gives g(x) = f(x) for Green x
Computing m(x) Can search back from x to find first Green Inefficient if x is preceded by a long Red stretch m(x) x
Approximating m(x)? x m*(x) Pick random Sample(x) of points less than x • Density inversely proportional to distance from x • Size is polylog(n) Green*(x) = { y: y in Sample(x) , yGreen} m*(x) = max {y in Green* (x)}
x m*(x) Is m*(x) good enough? y • Suppose y is Green and m*(x)≤ y ≤x • Since y is Green: g(y) = f(y) and g(x) = f(m*(x)) < f(y) = g(y) • g is not monotone
x y m*(x) Is m*(x) good enough? • To ensure monotonicity we need: x < y implies m*(x) < m*(y) • Requires relaxing the requirement: for all Greenz, m*(z) = z
Thinning out Green* (x) Plan: Eliminate certain unsafe points from Green*(x) Roughly, y is unsafefor x if for some z > x Some interval beginning with y and containing x has a high density of Reds. (There is a non-trivial chance that Sample(z) has no Green points ≥ y.) y x z
Thinning out Green* (x) Green*(x) = { y: y in Sample(x) , yGreen} m*(x) = max {y in Green* (x)} Green^(x) = { y: y in Green* (x) , y safe for x} m^(x) = max {y in Green^ (x)} (Hiding: Efficient implementation of Green^(x))
Redefining Green^(x) WHP • if x≤ y, then m^(x) ≤ m^(y) • {x: m^(x) ≠ x} is O(ε(f) |Γ|).
Summary of 1-dimensional case • Classify points as Green and Red • Few Red points • f restricted to Green is f-monotone • For each x, choose Sample(x) • size polylog(n) • All points less than x • Density inversely proportional to distance from x • Green^ (x) from Sample(x) that aresafe for x • m^(x) is the maximum ofGreen^(x) • Output g(x)=f(m^(x))
Dimension greater than 1 For x < y, want g(x) < g(y) y x
Red/Green Classification • Extend the Red/Green classification to higher dimensions: • f restricted to Green is Monotone • Red is small Straightforward (mostly) extension of 1-dimensional case
Given Red/Green classification In the one-dimensional case, Green^(x) = sampledGreenpoints safe for x g(x) = f(max {y : y in Green^ (x) } .
The Green points below x x 0 1 • Set of Green maxima could be very large • Sparse Random Sampling will only roughly capture the frontier • Finding an appropriate definition of unsafe points is much harder than in the one dimensional case
Further work • The g produced by our algorithm has d(g,f)≤ C(d)ε(f)|Γ| • Our C(d) is exp(d2) . • What should C(d) be? (Guess: C(d) = exp(d) ) • Distributed reconstruction for other interesting properties? • (Reconstructing expanders, Kale,Peres, Seshadhri, FOCS 08)