Towards Efficient Sampling: Exploiting Random Walk Strategy

Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

Motivations • Recent years have seen tremendous improvements in SAT solving. Formulas with up to 300 variables (1992) to formulas with one million variables. • Various techniques for answering “does a satisfying assignment exist for a formula?” • But there are harder questions to be answered . “how many satisfying assignments does a formula have?” Or closely related “can we sample from the satisfying assignments of a formula?”

Complexity • SAT is NP-complete. 2-SAT is solvable in linear time. • Counting assignments (even for 2cnf) is #P-complete, and is NP-hard to approximate (Valiant, 1979). • Approximate counting and sampling are equivalent if the problem is “downward self-reducible”.

counting/sampling probabilistic reasoning logic inference Challenge • Can we extend SAT techniques to solve harder counting/sampling problems? • Such an extension would lead us to a wide range of new applications. SAT testing

Standard Methods for Sampling - MCMC • Based on setting up a Markov chain with a predefined stationary distribution. • Draw samples from the stationary distribution by running the Markov chain for sufficiently long. • Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

Simulated Annealing • Simulated Annealing uses Boltzmann distribution as the stationary distribution. • At low temperature, the distribution concentrates around minimum energy states. • In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability. • Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

Standard Methods for Counting • Current solution counting procedures extend DPLL methods with component analysis. • Two counting precedures are available. relsat (Bayardo and Pehoushek, 2000) and cachet (Sang, Beame, and Kautz, 2004). They both count exact number of solutions.

Question: Can state-of-the-art local search procedures be used for SAT sampling/counting? (as alternatives to standard Monte Carlo Markov Chain and DPLL methods) Yes! Shown in this talk

Our approach – biased random walk • Biased random walk = greedy bias + pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT. • Can we use it to sample from solution space? • Does WalkSat reach all solutions? • How uniform is the sampling?

visited 500,000 times visited 60 times WalkSat Hamming distance

Probability Ranges in Different Domains

SampleSat: With probability p, the algorithm makes a biased random walk move With probability 1-p, the algorithm makes a SA (simulated annealing) move Nonergodic Quickly reach sinks Ergodic Slow convergence Ergodic Does not satisfy DBC + = SampleSat Improving the Uniformity of Sampling WalkSat SA

10 104 Comparison Between WalkSat and SampleSat WalkSat SampleSat

SampleSat Hamming Distance

Analysis

Property of F* • Proposition 1 SA with fixed temperature takes exponential time to find a solution of F* • This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

Proposition 2: pure RW reaches this solution with exp. small prob. Analysis, cont.

SampleSat • In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat.

SampleSat, cont. • After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA. • Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly. • This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

Verification on Larger formulas - ApproxCount • Small formulas -> Figures, solution frequencies. How to verify on large formulas? ApproxCount. • ApproxCount approximates the number of solutions of Boolean formulas, based on SampleSat algorithm. • Besides using it to justify the accuracy of our sampling approach, ApproxCount is interesting on its own right.

Algorithm • The algorithm works as follows (Jerrum and Valiant, 1986): • Pick a variable X in current formula • Draw K samples from the solution space • Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 • Repeat step 1-3 until all variables are set • The number of solutions of the original formula is the product of all multipliers.

Accumulation of Errors

Within the Capacity of Exact Counters • We compare the results of approxcount with those of the exact counters.

And beyond … • We developed a family of formulas whose solutions are hard to count • The formulas are based on SAT encodings of the following combinatorial problem • If one has n different items, and you want to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

Hard Instances • Encoding of P(20,10) has only 200 variables, but neither cachet or Relsat was able to count it in 5 days in our experiments. • On the other hard, ApproxCount is able to finish in 2 hours, and estimates the solutions of even larger instances.

Summary • Small formulas -> complete analysis of the search space • Larger formulas -> compare ApproxCount results with results of exact counting procedures • Harder formulas -> handcraft formulas compare with analytic results

Conclusion and Future Work • Shows good opportunity to extend SAT solvers to develop algorithms for sampling and counting tasks. • Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.

Towards Efficient Sampling: Exploiting Random Walk Strategy