430 likes | 576 Views
Reconstruction of Depth-3 circuits. Amir Shpilka Technion. Based on work with Zohar Karnin (Technion). Plan of talk. Background Problem definition Depth-3 circuits Results Proof idea: Structural theorem for zero depth-3 circuits Reconstruction of Depth 3 circuits.
E N D
Reconstruction of Depth-3 circuits Amir Shpilka Technion Based on work with Zohar Karnin (Technion)
Plan of talk • Background • Problem definition • Depth-3 circuits • Results • Proof idea: • Structural theorem for zero depth-3 circuits • Reconstruction of Depth 3 circuits
Reconstruction of arithmetic circuits • Input: Black-Box arithmetic circuit, over a finite field F, computing a polynomial f (x1,...,xn) C f(x1,...,xn) • Goal: Find a small circuit for f, using few queries • Motivation: natural problem, algebraic analog of learning • Caveat: queries from F or extension field of F
+ + + + M1 top fan-in = k L1,1 X X X X a1 an a0 ... x1 xn 1 Depth 3 circuits - (k) circuits Depth-3 = sums of products of linear functions L1,1 = t=1...n at¢xt + a0 Mi = j=1...diLi,j C = i=1...k Mi
+ M1 Mk L1,1 X X X X L1,d Depth 3 circuits - (k) circuits top fan-in = k + + + Li,j = t=1...n at¢xt + a0 Mi = j=1...diLi,j C = i=1...k Mi a1 an a0 ... x1 xn 1 Alternative view: shifted sparse polynomials g = y15 y23 ym7 + ... (k monomials) Replace each variable with a linear function in {xi}
Why study depth-3 circuits ? • Easiest model for which lower bounds are difficult (only (n2) over C) • Depth-4 circuits almost equivalent to general circuits[Agrawal Vinay] • Exponential lower bounds for depth-4 circuits imply exponential lower bounds for general circuits • Polynomial time black-box polynomial identity testing for depth-4 implies derandomization of identity testing for general circuits • Understanding depth-3 is important
Known results for depth-3 circuits • Lower bounds: • Exponential lower bounds over finite fields [Grigoriev Karpinsky, Grigoriev Razborov]. • Quadratic lower bounds over R, C[S Wigderson]. • Zero Testing: • Polynomial time when the circuit is given to us [Kayal Saxena]. • Quasi-polynomial time in the black-box model [Karnin S]. • Recall: for depth-2 circuits everything known. Depth-4 closely related to the general case. only for (k), with k=O(1)
Our results • Reconstruction of (k) circuits: quasi-polynomial time algorithm • Reconstruction of read-k depth-3 circuits (every variable appears in at most k linear functions): polynomial time algorithm • Corollary: polynomial time reconstruction of multilinear (k) circuits
Comparison to previous results • Poly-time reconstruction of Sparse polynomials (depth-2 circuit)[Ben-Or Tiwari],[Grigoriev Karpinski Singer],... ,[Klivans Spielman] • More generally: Randomizedreconstructionof polynomials whose "ordered" partial derivatives span a low dimensional space[Beimel Bergedano Bshouty Kushilevitz Varricchio], [Klivans S] • Reconstruction of Read-Once arith. formulas: • Poly-time randomized [Hancok Hellerstein], [Hancok Hellerstein Bshouty], [Bshouty Bshouty]. • Sub-exponential deterministic [S Volkovich] • Reconstruction of C ) ZPEXPRPC [Fortnow Klivans]
Proof technique • Proof combines and extends several previous works: • Theorem on structure of zero (k) circuits [Dvir S] • Black-box zero testing of (k) circuits [Karnin S] • Reconstruction of (2) circuits [S] • Today: first give background on depth-3 circuits, then (2) circuits and finally (hopefully) cover (k) circuits
What's next: • Structural theorem for zero (k) • Reconstruction of(2) • Reconstruction of(k)
More on depth-3 circuits • Depth-3 = sum of products of linear functions • Li,j= t=1...n at¢xt + a0 • Mi = j=1...diLi,j • C = i=1...k Mi • Mi = multiplication gate = product of lin. functions • deg(C) = Maxi=1...kdeg(Mi) • gcd(C) = greatest common divisor of mult. gates= g.c.d. (M1,M2,...,Mk) • Note: gcd(C) = product of linear functions • Simplification: sim(C) := C/gcd(C) also depth-3 • Main def: rank(C) = dimension of span of linear functions in sim(C)
Example • C = x2¢(y+x)¢(z-x-y) + x¢y¢(z-x-y) - 2(z-x-y)¢x2 • M1 = x2¢(y+x)¢(z-x-y) • M2 = x¢y¢(z-x-y) • M3= -2(z-x-y)¢x2 • deg(C) = 4 • gcd(C) = x¢ (z-x-y) • sim(C) = C/gcd(C) = x¢(y+x) + y -2x • rank(C) = dim(span{x, y+x, y, -2x}) = 2 • Note: without removing gcd, rank is 3. • Why define rank this way?
Zero depth-3 circuits • C is zero: if C computes the zero polynomial • C is minimal: if no proper subset of multiplication gates sum to zero • Structural theorem: • if a degree d (k) circuit C is minimal and zero then: • rank(C) = O(log(d)k-2) • Note:rank of arbitrary (k) circuit can be n
What is it good for? • Black-Box polynomial identity testing of (k) circuits in quasi-polynomial time [Karnin S] • Implies uniqueness: Corollary: If f is computed by a minimal (k) circuit C of rank (log(d)2k-2), then C is the unique(k) circuit for f. • Will play an important role later
What's next: • Structural theorem for zero (k) • Reconstruction of(2) • Reconstruction of(k)
Reconstruction of (2) • Input: Black-Box holding a (2) circuit • C = M1 + M2 = L1(X)¢L2(X)Ld(X) + L'1(X)¢L'2(X)L'd(X) • Goal: Reconstruct C using a few queries. • Two different cases: • C is of low rank (i.e. rank(C) ≤ log(d)2) • C is of high rank (i.e. rank(C) ≥ log(d)2)
High rank case • High level idea: if C = M1 = L1(X)¢L2(X)Ld(X) then reconstruction = factoring. E.g. can use [Kaltofen] • Problem: C = M1 + M2 • Idea: eliminate M2 by an appropriate restriction to a co-dim 1 space (i.e. make L'1 vanish) • Problems: • How do we find such a subspace? • How do we reconstruct M1? (we only have its restriction)
High rank case cont. • Input: Black-Box holding a (2) circuit C = M1 + M2 = L1(X)¢L2(X)Ld(X) + L'1(X)¢L'2(X)L'd(X) • Goal: Reconstruct C using a few queries. • Idea: eliminate M2 by restriction to a subspace (i.e. make L'1 vanish) • Problems: How to find such a subspace? • How do we reconstruct M1? • Basic approach: First learn C|V for low dimensional V. Then "lift" C|V to C. • Intuition: If V of low-dimension then we can use brute-force search to eliminate M2. • Problems: Computing M1|V, Lifting C|V to C. Solution: eliminate M2 in many ways... Requires high rank. Solve using structural theorem
High rank case cont. • C = M1 + M2 = L1(X)¢L2(X)Ld(X) + L'1(X)¢L'2(X)L'd(X) • High level algorithm: • Restrict circuit to a random subspace V. • Guess linearly independent linear functions L'1,...,L't from M2|V • Restrict further to Vi =V|L'i=0 • Learn M1|Vi by factoring • Glue the different factors together • Lift the circuit found in step 5
Gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0, for large t • We want to reconstruct the matrix from its deck of column deleted sub-matrices. L1 = a1,1¢x1 + ... + a1,t¢xt + a1,0 L2 = a2,1¢x2 + ... + a2,t¢xt+ a2,0 ... Ld = ad,1¢x1 + ... + ad,t¢xt+ ad,0
Gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0, for large t • Idea: find L in N1 and L' in N2 that agree on coordinates 3,4,... and glue them together • Problems: maybe many such L'
Gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0 • Idea: find LN1, L'N2 that agree on coordinates 3,4,5...t and glue them • Problem: maybe many such L' • Look at (**01) • Hard to tell which of the 4values is missing (0001)
Gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0 • Idea: find LN1, L'N2 that agree on coordinates 3,4,5...t and glue them • Problem: maybe many such L' • Idea: find L for which there is a unique L' • Problem: why such L exists? • Proof: isoperimetric inequality/information theory/lower bounds for locally-decodable-codes...
Gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0 • Idea: find LN1, L'N2 that agree on coordinates 3,4,5...t and glue them • Problem: maybe many such L' • Idea: find L for which there is a unique L' • Problem: why such L exists? Claim: If no such L then the rows give a subset of {0,1}t with too many edges (isoperimetric ineq.)
Back to gluing different factors • The Problem: we want to find N = i=1...d Li • Input: N1=N|x1=0,...,Nt=N|xt=0 • Idea: find LN1, L'N2 that agree on coord. 3,4,5...t and glue them • Problem: maybe many such L' • Idea: find L for which there is a unique L' • Problem: why such L exists? • Claim: if no such L then set of rows has too many edges • Proof: Consider L. If 8i 9Li L in Ni agreeing on all other coordinatesthen L has neighbor in i'th coordinate. If t is high then we have too many edges
Lifting the circuit • So far: reconstructed M1|V. Implies reconstruction of C|V. Need to lift C|V. • Idea: Learn C on many low-dimensional subspaces • Let Vi = span {V,ei}. • Find C|Vi for i=1...n. • Glue the circuits together. • Problem: Maybe the circuits cannot be glued(i.e. many different equivalent (2) circuits) • Structural Theorem implies we can glue (in the high rank case)
Lifting the circuit • Idea: Assume we can learn C|V for low dim. V. • Let Vi = span {V,ei}. • Find C|Vi for i=1...n. • Glue the circuits together. • Problem: Maybe we cannot glue (e.g. many different equivalent circuits) • Claim:If rank(C':=C|V) ¸ log(d)2 then C' is unique • Proof: Assume C' = C''. Then C'-C''=0. • By structural theorem rank(C'-C'') < log(d)2 • Corollary:8 i, (C|Vi)|V C|V. Glue together linear functions that look the same on V. • Fact: we succeed w.h.p. over choice of V
Low rank case • C = M1 + M2 = L1(X)¢L2(X)Ld(X) + L'1(X)¢L'2(X)L'd(X) • Dim(span{L1,...,Ld,L'1,...,L'd}) ≤ log(d)2 • Observation: C can be written as a polynomial in log(d)2 linear functions. • Reconstruction idea: find those log(d)2 linear functions, and then do interpolation to find C. • Problem: Finding the relevant linear functions.
Low rank case cont. • C = M1 + M2 = L1(X)¢L2(X)Ld(X) + L'1(X)¢L'2(X)L'd(X) • Dim(span{L1,...,Ld,L'1,...,L'd}) ≤ log(d)2 • Observation: C is a polynomial in log(d)2 linear functions. • Reconstruction idea: find those log(d)2 lin. functions, and interpolate C • Problem: Finding the relevant linear functions. • Algorithm sketch: • Pick a random subspace V of dimension 2log(d)2. • Brute force, find a basis {Li}i=1...r (over V) for C|V • Find polynomial Q s.t. C|V =Q(L1,...,Lr) • Fact: 9! {Li}i=1...r s.t. C=Q(L1,...,Lr) and Li|V=Li • Lift {Li}i=1...r to a basis for C over Fn
Reconstruction algorithm • Restrict to a random subspace • Guess high rank or low rank • high rank: • Learn one multiplication gate by looking at many restrictions to co-dim 1 subspaces, factoring the restricted circuit and gluing • Find the second gate by factoring • low rank: guess a basis and reconstruct • Lift the dimension by 1 • high rank: reconstruct using uniqueness • low rank: guess the lift of the basis • Verify using identity testing
What's next: • Structural theorem for zero (k) • Black Box PIT for (k) • Reconstruction of(k)
Higher values of 2 • Bad news: work so far is just a warm up... • How to generalize to (k)? • Two possible problems: • For k=2 different alg. for low rank and high rank • A (k) circuit may be the sum of low rank and high rank circuits • In the case of high rank we "singled out" one gate • How do we single out a gate when there are many gates. Is it possible? • Need to understand the algorithm better
New ideas • Canonical circuits: • We define a distance function for multiplication gates • Cluster "close by" multiplication gates • A cluster has low rank (after removing g.c.d.) • Theorem: every (k) circuit can be written uniquely as a sum of clusters. • Note: a (2) circuit is either low rank (one cluster) or high rank (two clusters) • Isolation lemma: can find many restrictions that eliminate all clusters but one.
Canonical circuits • Clustering lemma: if C is (k) then: 9 partition I1t I2t ... t Im = [k] s.t. • rank(C|Ij) ≤ log(d)a(k) • 8lj dist(CIl,CIj):= rank(C|Il + C|Ij) ≥ log(d)A(k) • Intuitively: A cluster is more "robust" than a multiplication gate • Theorem: If C= C1 + ... +Cr and C= C'1+...+C't are two clustered representation of f, then 9 permutation s.t. Ci=C'(i) (as polynomials) • Corollary: 9 unique canonical representation for C CI is the sum of gates in I
More on canonical circuit • Def: V is D-rank-preserving for C if: • no two linearly independent linear functions in C are linearly dependent on V • rank(CI|V) ≥ min{rank(CI),D} • Intuitively: linear functions remain as independent as possible • Theorem: If V is (log(d)2r-2)-rank-preserving for C and C= C1 + ... +Cr is the canonical circuit for C then CV= C1|V + ... +Cr|V a is the (unique) canonical circuit for C|V • Corollary: if we reconstruct the restrictions of the clusters to V, {Ci|V}, and lift each cluster separately then we get C. • Note: this is similar in nature to previous algorithm.
(2) revisited • Algorithm has the following form: • restrict to an (log(d)2)-rank-preserving subspace V • reconstruct the canonical circuit of C|V • one cluster if the rank is low • two clusters if the rank is high • Lift each cluster separately to Fn • We shall generalize this view of the algorithm • Need to show how to learn a cluster
Isolation lemma • Theorem: 9 many "high"-dimensional subspaces Vi½V that all but one cluster vanish on. • Proof: main technical difficulty of paper (generalizes main lemma of structural theorem) • Theorem: Given restrictions of a cluster {C1|Vi} there exists an efficient gluing algorithm that outputs C1|V. • Theorem: Lifting is possible due to uniqueness • Corollary: If we can find those subspaces then we can learn C. • Question: how to find such subspaces?
Separating the clusters Assume C|_V is on poly(log n) variables. Question: how to single out a cluster in C ? Claim: 9 many "high"-dimensional subspaces that all but one cluster vanish on. Claim: If we can find those subspaces then we can learn the special cluster Corollary: If we can find such a subspace then we can learn C. Question: how to find such a subspace? Answer: Go over all possible subspaces Requires exp(poly(log n)) time Question: how to verify that we have a cluster? Answer: be patient and wait till the end...
The reconstruction algorithm • Let V be a random poly(log(n)) dimensional subspace. Consider C|V • Guess subspaces V1,...,Vt'µ V • Assume C|Vi is a uni-cluster circuit • Learn C|vi by low-rank reconstruction • Glue {Cvi} with gluing algorithm, to get C1|V • Recursively learn C'|V = C|V - C1|V • Verify correctness by Black-Box identity testing • Lift each cluster, separately, to C
Final Remarks • We can make the above algorithm deterministic (i.e. can find rank-preserving subspace in an efficient way). • Can we break the O(1) (or actually o(n)) barrier on the number of multiplication gates?(both for identity testing and reconstruction)
A concrete open problem • Tightness of structural thm: is it true thatif C 0 is simple and minimal (3) then rank(C) = O(1) • (over characteristic zero!) • Namely, if i Ai() + i Bi() + i Ci() = 0, no g.c.d., then rank{Ai,Bj,Cl} = O(1)? • If true then we get a Black-Box PIT in poly. time for (k) (over char. 0) • If true over a finite field, then implies poly time reconstruction.