Adaptive annealing: a near-optimal connection between sampling and counting

Adaptive annealing: a near-optimal connectionbetween sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

Counting independent sets spanning trees matchings perfect matchings k-colorings

(approx) counting  sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86, the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] squared coefficient of variation (SCV) = O(1) E[Xi]2

(approx) counting  sampling E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = P( )

P(AB)=P(A)P(B|A) JVV for independent sets ? ? ? ? ? ? P() = P() P() P( ) P( ) X1 X2 X3 X4 V[Xi] Xi [0,1] and E[Xi] ½  = O(1) E[Xi]2

JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples.

JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: SAMPLER ORACLE set from gas-model Gibbs at  , graph G then FPRAS using O*(n) samples.

Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree  4. Total running time: O* ( |V|2 ).

Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95) total running time

easy = hot hard = cold

Hamiltonian 4 2 1 0

Big set =  Hamiltonian H :   {0,...,n} Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt]

Distributions between hot and cold • = inverse temperature • = 0 hot uniform on  • = cold uniform on H-1(0)  (x)  exp(-H(x)) (Gibbs distributions)

Distributions between hot and cold  (x)  exp(-H(x)) exp(-H(x))  (x) = Z() Normalizing factor = partition function Z()=  exp(-H(x)) x

Partition function Z()=  exp(-H(x)) x have: Z(0) = || want: Z() = |H-1(0)|

Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() SAMPLER ORACLE subset of V from  graph G 

Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W 

Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W  X = exp(H(W)( - ))

Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W  X = exp(H(W)( - )) can obtain the following ratio: Z() E[X] = (s) X(s) = Z() s

Our goal restated Partition function Z() =  exp(-H(x)) x Goal: estimate Z()=|H-1(0)| Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) 0 = 0 < 1 < 2 < ... < t = 

Our goal restated Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) Cooling schedule: 0 = 0 < 1 < 2 < ... < t =  How to choose the cooling schedule? minimize length, while satisfying Z(i) V[Xi] =O(1) E[Xi] = E[Xi]2 Z(i-1)

Parameters: A andn n Z() = ak e- k  k=0 Z() =  exp(-H(x)) x Z(0) = A H:  {0,...,n} ak = |H-1(k)|

Parameters Z(0) = A H:  {0,...,n} A n 2V E independent sets matchings perfect matchings k-colorings V V! V! V kV E

Previous cooling schedules Z(0) = A H:  {0,...,n} 0 = 0 < 1 < 2 < ... < t =  “Safe steps” •  + 1/n •  (1 + 1/ln A) ln A  (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Cooling schedules of length O( n ln A) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) O( (ln n) (ln A) )

No better fixed schedule possible A 1+a Z(0) = A H:  {0,...,n} A schedule that works for all - n Za() = (1 + a e ) (with a[0,A-1]) has LENGTH ( (ln n)(ln A) )

Parameters Z(0) = A H:  {0,...,n} Our main result: can get adaptive schedule of length O* ( (ln A)1/2 ) Previously: non-adaptive schedules of length *( ln A )

Existential part can get adaptive schedule of length O* ( (ln A)1/2 ) Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) there exists

Express SCV using partition function Z() E[X] = Z() (going from  to ) W  X = exp(H(W)( - )) E[X2] Z(2-) Z() C = E[X]2 Z()2

E[X2] Z(2-) Z() C = E[X]2 Z()2   2- f()=ln Z() Proof: C’=(ln C)/2

f is decreasing f is convex f’(0)  –n f(0)  ln A f()=ln Z() either f or f’ changes a lot Proof: Let K:=f 1 1 (ln |f’|)  K

f:[a,b]  R, convex, decreasing can be “approximated” using f’(a) (f(a)-f(b)) f’(b) segments

Technicality: getting to 2- Proof:   2-

Technicality: getting to 2- Proof: i   2- i+1

Technicality: getting to 2- Proof: i   2- i+2 i+1

Technicality: getting to 2- Proof: ln ln A extra steps i   2- i+2 i+1 i+3

Existential  Algorithmic can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule of length O* ( (ln A)1/2 ) there exists

Algorithmic construction exp(-H(x))  (x) = Z() Our main result: using a sampler oracle for  we can construct a cooling schedule of length  38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls  107 (ln A) (ln ln A+ln n)7 ln (1/)

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2 X is “easy to estimate”

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2 we make progress (assuming B1>1)

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2 need to construct a “feeler” for this

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

Algorithmic construction current inverse temperature  bad “feeler” ideally move to  such that Z() E[X2] E[X] = B1  B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

Rough estimator for n Z() = ak e- k  k=0 Z() Z() ak e- k For W  we have P(H(W)=k) = Z()

Rough estimator for n Z() = ak e- k  k=0 Z() Z() If H(X)=k likely at both ,  rough estimator ak e- k For W  we have P(H(W)=k) = Z() ak e- k For U  we have P(H(U)=k) = Z()

Rough estimator for Z() Z() Z() Z() ak e- k For W  we have P(H(W)=k) = Z() ak e- k For U  we have P(H(U)=k) = Z() P(H(U)=k) P(H(W)=k) = ek(-)

Rough estimator for n Z() = ak e- k  k=0 Z() Z() d  ak e- k For W  we have P(H(W)[c,d]) = k=c Z()

Rough estimator for Z() Z() Z() Z() Z() Z() If |-| |d-c|  1 then 1 P(H(U)[c,d]) P(H(W)[c,d]) ec(-)  e e We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large.

Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W) I)  1/8h We say that I is HEAVY for 

Adaptive annealing: a near-optimal connection between sampling and counting

Adaptive annealing: a near-optimal connection between sampling and counting

Presentation Transcript

Implement sampling procedures

RANDOM SAMPLING:

Sampling and Sample Size Determination

Data Stream Algorithms Intro, Sampling, Entropy

Ch 2: probability sampling, SRS

SAMPLING METHODS

Adaptive Design Methods in Clinical Trials

STATISTICAL SAMPLING FOR AUDITORS

OPTIMAL LEARNING CONDITIONS

Making the Connection in the Caribbean… to the Rest of the World

CS B551: Element sof Artificial Intelligence

Exercising Adaptive Leadership

Cowboy Counting

Adaptive Query Processing with Eddies

The Adaptive Immune System

The Immune System: Innate and Adaptive Body Defenses: Part B

Sampling and Reconstruction

Sampling Bayesian Networks

AoPS: