450 likes | 528 Views
Explore the importance and impact of benchmark libraries in the field of competitive SAT and CSP competitions. Learn about the history, challenges, and benefits of creating benchmark libraries. Discover the ideal characteristics of benchmark libraries for effective system benchmarking.
E N D
SAT and CSP competitions & benchmark libraries:some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia
Outline • Benchmark libraries • Founding CSPLib.org • Competitions • SAT competition judge • TPTP competition judge • …
Why? • Why did I set up CSPLib.org • I needed problems against which to benchmark my latest inference techniques • Zebra and random problems don’t cut it! • I thought it would help unify and advance the CP community
Random problems • +ve • Easy to generate • Hard (if chosen from phase transition) • Impossible to cheat • You can solve 1000 variable random 3SAT problems at l/n=4.2, I’ll be impressed
Random problems • -ve • Lack structures found in real world • Unrepresentative • E.g. random 3SAT either have many solutions or none • Different methods work well on them • Random SAT: forward looking algorithms • Industrial SAT: backward looking algorithms
Why? • Thesis: every mature field has a benchmark library • Deduction started in 1960s • TPTP set up in 1993 • SAT started in 1960s • SAT DIMACS challenge in 1992 • SATLib set up in 1999 • CP started in 1970s • CSPLib set up in 1998
Why? • Thesis: every mature field has a benchmark library • Spatial and temporal reasoning started in early 80s (or before?) • It’s been approximately 30 years so it’s about time you guys set one up!
Benchmark libraries • CSPLib.org • Over 35k unique visitors • Still not everything I’d want it to be • But state of the art for experimentation is now much better than it was • I haven’t seen a zebra for a very long time
An ideal library • Desiderata taken from: • CSPLib: a benchmark library for constraints, Proc. CP-99
An ideal library • Location • On the web and easy to find • TPTP.org • CSPLib.org • SATLib.org • QBFLib.org • … • http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html • http://mat.gsia.cmu.edu/COLOR/instances.html
An ideal library • Easy to use • Tools to make benchmarking as painless as possible • tptp2X, … • Diverse • To help prevent over-fitting
An ideal library • Large • Growing continuously • Again helps to prevent over-fitting • Extensible • To new problems or domains
An ideal library • Complete • One stop for your problems • Topical • For instance, it should report current best solutions found
An ideal library • Independent • Not tied to a particular solver or proprietary input language • Mix of difficulties • Hard and easy problems • Solved and open problems • With perhaps even a difficulty index?
An ideal library • Accurate • It should be trusted • Used • A valued resource for the community
Problem format • Lo-tech or hi-tech?
Lo-tech formats • DIMACS format used in SATLib c a simple example p cnf 3 2 1 -1 0 1 2 3 0 This represents: x v -x, x or y or z
Lo-tech formats • DIMACS format used in SATLib • +ve • All programming languages can read integers! • Small amount of extensibility built in (e.g. QBF) • -ve • Larger extensions are problematic (e.g. beyond CNF to arbitrary Boolean circuits)
Hi-tech formats • CP competition <instance> <presentation name="4-queens" description="This problem involves placing 4 queens on a chessboard" nbSolutions="at least 1" format="XCSP1.1 (XML CSP Representation 1.1)" /> <domains nbDomains="1"> <domain name="dom0" nbValues="4" values="1..4" /> </domains> <variables nbVariables="4"> <variable name="X0" domain="dom0"/> … </variables> <relations nbRelations="3"> <relation name="rel0" domain="dom0 dom0” nbConflicts="10 conflicts="(1,1)(1,2)(2,1)(2,2)(2,3)(3,2)(3,3)(3,4)(4,3)(4,4)" /> … </relations > <constraints nbConstraints="6"> <constraint name="C0" scope="X0 X1" relation="rel0"/> …
Hi-tech formats • XML • +ve • Easy to extend • Parsing tools can be provided • -ve • Complex and verbose • Computers can parse terse structures easily
No-tech formats • CSPLib • Problems are specified in natural language • No agreement at that time for an input language • One focus was on how you model a problem • Today there is more consensus on modelling languages like Zinc
No-tech formats • CSPLib • Problems are specified in natural language • But you can still provide in one place • Input data • Results • Code • Parsers …
Getting problems • Submit them yourself • Initially, you must do this so library has some critical mass first time people look at it • But it becomes tiresome and unrepresentative to do so continually • Ask at every talk • Tried for several years but it (almost) never worked
Getting problems • Need some incentive • Offer money? • Price of entry for the competition? • If you have a competition, users will submit problems that their solver is good at?
Libraries + Competitions • You can have a library without a competition • But you can’t have a competition without a library
Libraries + Competitions • Libraries then competition • TPTP then CASC • Easy and safe! • Libraries and competition • Planning • RoboCup • …
Increasing complexity • Constraints • 1st year, binary extensional • 2nd year, limited number of globals • 3rd year, unlimited • Planning • Increasing complexity • Time, metrics, uncertainty, …
Benefits • Gets ideas implemented • Rewards engineering • Progress needs both science and engineering! • Puts it all together
Benefits • Gives greater importance to important low-level issues • In SAT: • Watched literals • VSIDS • …
Benefits • Witness the progress in SAT • 1985, 10s vars • 1995, 100s vars • 2005, 1000s vars • … • Not just Moore’s law at play!
Pitfalls • Competitions require lots of work • Organizers get limited (academic) reward • One solution is to organize also competition special issues
Pitfalls • Competitions encourage incremental improvements • Don’t have them too often! • You may discover a local minimum • E.g. MDPs for speech recognition • Give out best new solver prize?
The Chaff story • Industrial problems, SAT & UNSAT instances • 2008, 1st MiniSAT (son of zChaff) • 2007, 1st RSAT (son of MiniSAT) • 2006, 1st MiniSAT • 2005, 1st SatELite GTI (MiniSAT+preprocessor) • 2004, 1st zChaff (Forklift from 2003 was better) • 2003, 1st Forklift • 2002, 1st zChaff
Other issues • Man-power • Organizers • One is not enough? • Judges • All rules need interpretation • Compute-power • Find a friendly cluster
Other issues • Multiple tracks • SAT/UNSAT • Random/industrial/crafted • … • Certificate/Uncertificated
Other issues • Holding problems back if possible • Release some problems so competitors can ensure solver compliance • But hold most back so competition is blind!
Other issues • Multiple phases • Too many solvers for all to compete with long timeouts • First phase to test correctness • Second phase to throw out the slow solvers (who cost you many timeouts) • Third phase to differentiate between better solvers
Other issues • Reward function • <#completed, average time, …> • solution purse + speed purse • Points for each problem divided between those solvers that solve it • Getting buy in from competitors • It will (and should) evolve over time!
Other issues • Prizes • Give out many! • Good for people’s CVs • Good motivator for future years
Other issues • Open or closed source? • Open to share progress • Closed to get the best • Last year’s winner • Condition of entry • To see progress is being made!
Other issues • Smallest unsolved problem • Give a prize! • Timing • Run during the conference • Creates a buzz so people enter next year • Get a slot in program to discuss results • Get a slot in banquet to give out prizes
Conclusions • Benchmark libraries • When an area is several decades old, why wouldn’t you have one? • Competitions • Designed well, held not too frequently, & with buy-in from the community, why wouldn’t you?
Questions • Disagreements • Other opinions • Different experiences • …