1 / 101

Master Class on Experimental Study of Algorithms

Explore the scientific use of experimentation in studying constrained problems as natural phenomena in CP/CPOAIR. Discover the progress, uniqueness, and core scientific questions of this community.

royall
Download Presentation

Master Class on Experimental Study of Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Master Class on Experimental Study of Algorithms Scientific Use of Experimentation Carla P. Gomes Cornell University CPAIOR Bologna , Italy 2010

  2. Motivation: Viewing constrained problems as “natural” phenomena Adapted from: Talk on the Future of CP, CP2006 “Science of Constraints” Gomes and Selman, CP Letters, Vol1, 2007

  3. Progress in the last 20 years • CP/CPAIOR - tremendous progress in the last 20 years • CP has found a niche in which its techniques excel • Highly combinatorial problems; • CP solvers have moved into the real-world arena • CP/CPAIOR communities have developed sophisticated techniques by bridging across different disciplines e.g.: • Global constraints (use of network flow algorithms, dynamic programming, automata, etc) • Hybrid approaches (integration with OR) • Modeling language (e.g., OPL, COMET,ZINC)

  4. Progress in the last 10/20 years CP/CPAIOR community has attained a “critical mass” Quality and reputation of CP conferences and journals has grown steadily

  5. Question to the CP/CPAIOR Community What makes CP/CPAIOR unique, different from OR and algorithms? • Techniques? Declarative? • Applications? Uniqueness of CP/CPAIOR Community

  6. Related issue Engineering vs. Academia Industry vs. Academia Similar issues happen in OR – OR is known for its techniques and applications (ORIE) but not really as a major academic and scientific player – is that what we want for CP/CPAIOR?

  7. Question to the CP/CPAIOR community What are the core scientific questions that we are pursuing?

  8. Core Scientific Questions • E.g. Theoretical Computer Science – understanding the fundamental limits of computation with emphasis on space and time tradeoffs; • E.g. Artificial Intelligence – understanding, modeling, and replication of human intelligence; • Machine Learning – understanding the processes behind human learning and scientific discovery • Astronomy – origins of universe

  9. Core Question for the CP/CPAIOR Community Understanding, explaining, and exploiting constraint structures as they occur in the real-world.

  10. Traditional Algorithmic Design (CS)

  11. Traditional Algorithmic Design (CS) Traditional algorithmic design: • Driven by worst-case or (formal) average case analysis • Effective approach since early 60s but shortcomings are becoming more apparent: • Worst-case too pessimistic. Any NP-complete problem: 2^n (can’t do any better if P=/=NP; “end of alg. design”) • Average-case: Requires concrete probability distribution on instances. Quite likely infeasible to get a match with “real-world” problem distribution. • Computational tasks are studied • as if they were aformal mathematical object.

  12. A different perspective:vieiwng constrained problems as “natural” phenomena

  13. Viewing constrained problems as “natural” phenomena • Study computational constrained problems as naturally occurring objects / phenomena. That is, View algorithm design as a problem of the natural sciences instead of “only” a mathematical problem. • Advantage: Constrained Problems are worst-case NP-complete but (real-world) structurelikely allows for practically effective and scalable solution strategies. • Let’s make this more concrete.

  14. Viewing constrained problems as “natural” phenomena Central question: • Understanding, Exploiting, and Explaining Constrained Structures as Occurring in Real-World – not only as abstract mathematical objects but also as “natural” phenomena Methodology: • Scientific Method – • observe the phenomena; collect data; perform experimentation • develop formal models and theories to explain the phenomena and formulate hypotheses; • check the validity of the model in real-world problems (experimentation, validation, refutation) Goal: Develop solution strategies exploiting structural properties as found in the real world Key: Scientific Use of Experimentation to understand constrained structures

  15. Worst • Case • Complexity • Random • Instances • Structured • Problems Typical Case Complexity • Principled • Experimentation • Formal • Models Viewing constrained problems as “natural” phenomena • This approach is key to understanding the gap betweentheory and practice and avoiding worst-case and (formal) average-case road block.

  16. Viewing constrained problems as “natural” phenomena Scientific Use of Experimentation applied to the study constrained problems has led to the discovery of interesting computational phenomena.

  17. Viewing constrained problems as “natural” phenomena [Apologies for bias towards my own work.] • Examples: • Phase transition phenomena(empirical start followed by rigorous mathematical modeling; became active subfield with mathematicians, physicists, and computer scientists.) • Heavy-tailed phenomena in backtrack search(empirical start followed by mathematical modeling; led to randomization and rapid restarts (used in SAT/ CP solvers) and backdoors; dialog with different communities: pervasiveness of heavy-tails in real-world settings in economics, science, and engineering.

  18. Viewing constrained problems as “natural” phenomena • Examples, cont.: • Backbone and backdoor variables initially a formal notion to explain heavy-tails, validated empirically; boosts understanding of CSP/SAT solvers on large real-world problems; explain why randomization is so effective. • XOR Streamlining Inspired by a real-world problem, motivated by a need for a general approach and by the intriguing theoretical properties of XOR constraints; led to knew model counting and sampling strategies; boosts CSP search on real-world instances.

  19. Viewing constrained problems as “natural” phenomena • Examples, cont.: • Small World Phenomena initially an empirical notion, formalized mathematically recently - shown to be pervasive across several natural and engineered constrained structures - explains interesting phenomena in complex structures: from social interactions to outages of the power grid to computational phenomena Small world phenomenon Pioneered science of networks [Watts & Strogatz]

  20. Random • Instances • Formal • Models • Real-World • Problems • Principled • Experimentation Viewing constrained problems as “natural” phenomena • It’s unlikely that pure mathematical thinking / modeling would have led us to the discovery of these phenomena. • In fact, to understand real-world constrained problems (which are everywhere!), we argue that the empirical study of phenomena followed by rigorous models and analysis is a sine qua non for the advancement of the field. • Scientific methodology sets our community apart from standard approach towards algorithm development / optimization in CS & OR. And, we believe this methodology will be very fruitful over the next decade(s). With the increase of compute power and availability lots of data this is a very promising approach!!!!

  21. Big Picture of Topics Covered in this talk Part I • Understanding computational complexity beyond worst-case complexity • Benchmarks: The role of Random Distributions Random SAT • Typical Case Analysis vs. Worst Case Complexity analysis – phase transition phenomena • Part II • Understanding runtime distributions of complete search methods • Heavy and Fat-Tailed Phenomena in combinatorial search and Restart strategies • Understanding tractable sub-structure • Backdoors and Tractable sub-structure • Formal Models of Heavy-tails and Backdoors • Performance of current state-of-the art solvers on real-world structured problems exploiting backdoors

  22. I - Understanding computational complexity beyond worst-case complexity

  23. Standard Algorithmic Approach: Too Pessimistic. Ideal Approach, but… What distribution? Computational Complexity How does an algorithm scale? Alternative: study “Typical case complexity” across range of values for critical parameters.

  24. I - Understanding computational complexity beyond worst-case complexityRandom Sat

  25. Motivation: An example of Combinatorial Search Satisfiability: Prototypical hard combinatorial search and reasoning problem. First problem to be shown to be NP-Complete. (Cook 1971) • Satifiability (SAT): Given a formula in propositional calculus, is there a model • (i.e., an assignment of True or False to its variables) making it true? • ( ab c )  ( b c)  ( a  c)

  26. Surprising “power” of SAT for encoding real-world highly combinatorial problems. Satisfiability From academically interesting to practically relevant.

  27. Significant progress in SATISFIABILITY Solving From 100 variables, 200 constraints (early 90’s) to instances with millions of variables and millions of constraints in the last 15 years. Technology Transitions: Hardware and Software Verification, Planning, Scheduling, Optimal Control, Protocol Design, Routing, Multi-agent systems, E-Commerce (E-auctions and electronic trading agents), etc.

  28. SAT Complexity • NP-Complete - worst-case complexity • (2n possible assignments) • “Average” Case Complexity (I) - Constant Probability Model– Goldberg 79; Goldberg et al 82 • N variables; L clauses • p - fixed probability of a variable in a clause (literals: 0.5 +/-) (i.e., average clause length is pN) • Eliminate empty and unit clauses Key problem: easy distribution; on average, this SAT model can be easily solved - O(n2) Franco 86; Franco and Ho 88

  29. Hard satisfiability problems • Consider random 3-CNF sentences. e.g., • (D B  C)  (B A C)  (C B  E)  (E D  B)  (B  E C) m = number of clauses n = number of symbols

  30. Typical Case Analysis SAT Complexity • “Average” Case Complexity (II) • Fixed-clause Length Model – Random K-SATFranco 86; • N variables; L clauses; K number of literals per clause • Randomly choose a set of K variables per clause (literals: 0.5 +/-) • Expected time – O(2n) Can we provide a finer characterization beyond worst-case results?

  31. Typical-Case Complexity • Typical-case complexity: a more detailed picture • Characterization of the spectrum of hardness of instances as we vary certain interesting instance parameters • e.g. for SAT: clause-to-variable ratio. • Are some regimes easier than others? • What about a majority of the instances?

  32. Typical Case Analysis:3 SATAll clauses have 3 literals Median Runtime Selman et al. 92,96

  33. Median • Hard problems seem to cluster near m/n = 4.3 (critical point)

  34. Location of Threshold? Empirical: 4.25 Mitchell, Selman, and Levesque ’92, Crawford ’93. Exact Location of Threshold? Surprisingly challenging problem…. Tremendous interactions with other communities OR, Physics, Mathematics

  35. Linear time results --- Random 3-SAT • Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03). • empirically up to 2.5 • Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6 • exponential, ratio 4.0 and up (Achlioptas and Beame ’02) • approx. 400 vars at phase transition • GSAT up till ratio 3.92 (Selman et al. ’92, Zecchina et al. ‘02) • approx. 1,000 vars at phase transition • Walksat up till ratio 4.1 (empirical, Selman et al. ’93) • approx. 100,000 vars at phase transition • Survey propagation (SP) up till 4.2 • (empirical, Mezard, Parisi, Zecchina ’02) • approx. 1,000,000 vars near phase transition • Unsat phase: little algorithmic progress. • Exponential resolution lower-bound (Chvatal and Szemeredi 1988)

  36. Phase transitions (as expected…) • Computational properties (surprise…) • (Monasson, Zecchina, Kirkpatrick, Selman, Troyansky 1999.)

  37. Phase transition Random Walk DP DP’ GSAT Walksat SP Random 3-SAT a Linear time algs.

  38. 4.643 4.506 4.596 4.762 5.081 4.601 5.19 Random Walk DP DP’ GSAT Walksat SP Random 3-SAT c Linear time algs. Upper bounds by combinatorial arguments (’92 – ’05)

  39. Location of Threshold • Surprisingly challenging problem ... • Current rigorously proved results: • 3SAT threshold lies between 3.42 and 4.506. • Motwani et al. 1994; Broder et al. 1992; • Frieze and Suen 1996; Dubois 1990, 1997; • Kirousis et al. 1995; Friedgut 1997; • Archlioptas et al. 1999; • Beame, Karp, Pitassi, and Saks 1998; • Impagliazzo and Paturi 1999; Bollobas, • Borgs, Chayes, Han Kim, and • Wilson1999; Achlioptas, Beame and • Molloy 2001; Frieze 2001; Zecchina et al. 2002; • Kirousis et al. 2004; Gomes and Selman, Nature ’05; • Achlioptas et al. Nature ’05; and ongoing… Empirical: 4.25 --- Mitchell, Selman, and Levesque ’92, Crawford ’93.

  40. Phase Transition for 2+p-SAT We have good approximations for location of thresholds.

  41. Computational Cost: 2+p-SATTractable substructure can dominate! > 40% 3-SAT --- exponential scaling Mixing 2-SAT (tractable) & 3-SAT (intractable) clauses. Medium cost <= 40% 3-SAT --- linear scaling Num variables (Monasson et al. 99; Achlioptas ‘00)

  42. Results for 2+p-SAT • p < = 0.4 --- model behaves as 2-SAT • search proc. “sees” only binary constraints • smooth, continuous phase transition (2nd order) • p > 0.4 --- behaves as 3-SAT (exponential scaling) • abrupt, discontinuous transition (1st order) • Note: problem is NP-complete for any p > 0.

  43. Key Observation • In a worst-case intractable problem --- such as 2+p-SAT --- • having a sufficient amount of tractable problem substructure • (possibly hidden) can lead to provably poly-time --- in fact • linear --- average case behavior. • Conjecture: Our world may be “friendly enough” to make • many typical reasoning tasks poly-time --- challenging the • conventional worst-case complexity view in CS. • Next: • Capturing hidden problem structure. • (Gomes et al. 03, 04)

  44. Density of States (DOS) Ermon, Gomes, Selman, CP 2010 Given a SAT formula F with m clauses (or constraints) The density of a state E , counts the number of variable assignments or configurations that violate exactly E clauses or constraints, for all values of E. . The density of states is a very detailed characterization of the configuration space associated to a formula. concept borrowed from satistical physics. In particular, n(0) is the number of configurations violating 0 clauses  number of solutions. The lowest value of E with a non-zero density (i.e. minE{n(E) > 0}) is the solution of the corresponding MAX-SAT problem. Computing DOS is at least as hard as model counting - #P

  45. New approach – MCMCFLATSAT Ermon, Gomes, Selman, CP 2010 • Algorithm inspired by recent work in Statistical Physics community for density of states (e.g., Ising and Potts model) – Flat Histogram method • Key idea – • if we perform a random walk in the configuration space {0, 1}n such that the probability of visiting a given energy level E is inversely proportional to the density of state E (n(E)), then a flat histogram is generated for the energy distribution of the states visited.

  46. Random Formulas – 3-SATPhase Transitions Novel phase transitions for g(i), i>0 Standard Phase Transition g(0) Fraction of formulas with at most i violated clauses Ermon, Gomes, Selman, CP 2010

  47. Phase transition and combinatorial problems is an active research area with fruitful interactions between computer science, physics (approaches from statistical mechanics), and mathematics (combinatorics / random structures). Also, a close interaction between experimental and theoretical work. (With experimental findings quite often confirmed by formal analysis within months to a few years.) Finally, relevance to applications via algorithmic advances and notion of “critically constrained problems.”

  48. I - Understanding computational complexity beyond worst-case complexity More Structured Instances

  49. Quasigroup or Latin Square (Order 4) A Quasigroup or Latin Square is an n-by-n matrix such that each row and column is a permutation of the same n colors The Quasigroup or Latin Square Completion Problem (QCP): 68% holes Quasigroups or Latin Squares:An Abstraction for Real World Applications Gomes and Selman 97

  50. Constraint Network of Latin Square Cells of table  Nodes Edges connect nodes in the same row/column (n – order of latin square)

More Related