Data Structures for SAT Solvers The 2-Literal Representation

Data Structures forSAT SolversThe 2-Literal Representation Gábor Kuspergkusper@aries.ektf.hu Eszterházy Károly College Eger, Hungary

Boolean Satisfiability (SAT) • Identify truth assignment that satisfies boolean formula or prove it does not exist • Well-known NP-complete problem

Outline • Notation • Data structures used by SAT solvers • Literal matrix (Scherzo) • Adjacency lists (GRASP, …) • Head/tail lists (SATO) • Watched literals (Chaff) • New data structure: • 2-Literal Matrix

Positive Literal Negative Literal Clause Conjunctive Normal Form (CNF) j = ( a +c ) ( b +c ) (¬a +¬b + ¬c )

unsat unresolved satisfied satisfied a assigned 0 b assigned 1 c and d unassigned Literal & Clause Classification j = (a+¬b)(¬a+b + ¬c)(a+ c + d)(¬a+¬b + ¬c)

Additional Definitions • Resolution Example: 1 = (¬a + b + c), 2 = (a + b + d) Resolution: res(1, 2, a) = (b + c + d) • Unit Propagation • An unresolved clause is unit if it has exactly one unassigned literal j = (a+c)(b+c)(¬a+¬b + ¬c) • A unit clause has exactly one option for being satisfied • c must be set to 0. • Boolean Constraint Propagation: iterated application of unit propagation

Data Structures • Literal matrix (Scherzo) • View CNF formula as a matrix, where the rows denote the clauses and the columns the variables • 2-Literal matrix, NEW • Adjacency lists (most SAT solvers) • Counter-based state maintenance • Keep counters of sat, unsat and unassigned (free) literals for each clause • Lazy data structures • Head/Tail lists (SATO) • Watched literals (Chaff)

State-of-the-art SAT Solvers • MiniSAT solver:http://www.cs.chalmers.se/Cs/Research/FormalMethods/MiniSat/ • Java SAT solver:http://www.sat4j.org/ • A paper about data structures:Efficient data structures for backtrack search SAT solversInês Lynce and João Marques-Silva

Literal Matrix • View CNF formula as a matrix, where the rows denote the clauses and the columns the variables • Assigned variables result in unsat literals • Satisfied clauses result in sat clauses • Each clause is an array of bits • Each clause contains counter of sat, unsat and unassigned sat literals • Used in the past in Binate Covering algorithms • E.g.: Scherzo, by Courdert et al., DAC’95 and DAC’96

1-Literal Matrix Representation • We can call the Literal Matrix to 1-Literal Matrix • We decode combination of 1-clause, each 1-clause correspond to a bit:01: -, 10: +01: a, 10:ā • The representation:00 sat 10 ā01 a 11 unsat

j = (a+¬b)(¬a+b + ¬c )(a+ c + d)(¬a+¬b + ¬c) j = (a+¬b)(¬a+b + ¬c)(a+ c + d)(¬a+¬b + ¬c) j = (a+¬b)(¬a+b + ¬c)(a+ c + d)(¬a+¬b + ¬c) a assigned 0 b assigned 1 a b c d a b c d a b c d a+¬b ¬a+b + ¬c a+ c + d ¬a+¬b + ¬c a+¬b a+ c + d a+¬b a+c + d + - x x - + - x + x + + - - - x x x x x sat x x + + sat x - x x sat x x + + sat 1-Literal Matrix

a b c d x - x x sat x x + + sat a b c d a +¬b ¬a +b + ¬c a + c + d ¬a +¬b + ¬c + - x x - + - x + x + + - - - x 1-Literal Matrix a assigned 0 b assigned 1

Definition of k-clause • A k-clause has k literal. • Example: j = ( a +c ) ( b +c ) (¬a +¬b + ¬c ) • 3-clauses in this formula are: • (¬a +¬b + ¬c ) • 2-clauses in this formula are: • (a + c) • (b + c) • There is no unit, i.e., 1-clause in this example.

2-Literal Matrix Representation • We decode combination of 2-clause. Each 2-clause correspond to a bit:1000: ae, 0100:aē,0010:āe, 0001: āē • Can code every boolean functions with two variables. • The representation:0 0000 sat 8 1000 ae1 0001 āē 9 1001 ae2 0010 āe A 1010 e3 0011 ā B 1011 āe4 0100 aē C 1100 a5 0101 ē D 1101 aē6 0110 ae E 1110 ae7 0111 āē F1111 unsat

a c b d a c b d a b c d a c b d a c b d 1101 0011 0001 1111 0100 0110 + x - x - - - x - - - x - - + x + + x + a+¬b ¬a+b + ¬c a+ c + d ¬a+¬b + ¬c a+¬b ¬a+b + ¬c a+ c + d ¬a+¬b + ¬c a+¬b ¬a+¬b + ¬c ¬a+b + ¬c a+ c + d + - x x - + - x + x + + - - - x + x - x - - + x + + x + - - - x + x - x - - - x - - + x + + x + 2-Literal Matrix ++ 1000 +- 0100 -+ 0010 -- 0001 xx 1111

a c b d a c b d 1101 0011 0001 1111 0100 0110 1100 0011 0000 1111 0100 0110 2-Literal Matrix ++ 1000 +- 0100 -+ 0010 -- 0001 xx 1111 (a + c) assigned 1 a assigned 1

Unit Propagation • public void unitPropagation(int column, BitSet unitToProp) { • if (nLiterals[column].equals(unSatLit)) • return; • BitSet clone = (BitSet)nLiterals[column].clone(); • clone.and(unitToProp); • if (clone.equals(nLiterals[column])) • subsumed = true; • nLiterals[column].or(unitToProp); • if (nLiterals[column].equals(unSatLit)) • numberOfEffectiveLiterals--; • }

n-Literal Matrix Representation • We decode combination of n-clause, each n-clause correspond to a bit. • It can code every boolean functions with n variables. • We need 2n bit. • The 1-literal and the 2-literal matrix have the same size.

1-Literal vs. 2-Literal Matrix • 1-Literal Matrix: • Advantages: • Easy to implement • Unit propagation results either in an sat clause or an unsat literal • Disadvantages: • Wasteful, on 4 bit we store only 9 different information

1-Literal vs. 2-Literal Matrix • 2-Literal Matrix: • Advantages: • Economical, on 4 bit we store 15 different information • One can propagate more (1110) or less (1000) information at once as a normal unit (1100) • Disadvantages: • Unit propagation by a 2-literal does not necessarily result in a sat clause or an unsat literal

Standard CNF Representation • Adjacency list representation: • Each clause contains: • A list of literals • Counter of sat, unsat and unassigned (free) literals • Each variable x keeps a list with all clauses with literals on x • Number of references kept in variables equals total number of literals, |L| • Used in some SAT solvers: • GRASP • rel-sat (some versions) • POSIT • etc.

Lazy Data Structures • Head/Tail Lists • Each clause contains a list of literals • Each unresolved clause is only referenced in twounassigned variables (but possibly in several assigned variables) • Each time a variable is assigned, referenced clauses either become unit, sat, unsat or a new reference becomes associated with another of the clause’s unassigned variables • Unit and unsat clauses can then be identified in constant time • Clause can be declared unit/unsat by inspection of two references • When backtracking, previous references are recovered • Knowledge of the order of literal assignments is maintained and it is essential

Examples of Lazy Structures unsatisfied literal clause literals @1 @3 @2 @4 literal references kept in variables unassigned literal satisfied literal literal assigned search decision depth d, @d Largest number of literal references in variables: |L| Smallest number of literal references in variables: 2|C|

H H H T T T @5 @1 @5 @3 @2 @4 @1 @3 @2 H H H H T T T T @1 @3 @2 @4 @1 @3 @2 @1 @3 @2 @4 Backtracking Unit clause @5 @5 @1 @1 @3 @3 @2 @2 @4 @4 Head/Tail Lists

Lazy Data Structures • Watched Literals • Each unresolved clause is only referenced in two unassigned variables (and not in any assigned variables) • Each time a variable is assigned, referenced clauses either become unit, sat, unsat or, of the two clause references, one becomes associated with another of the clause’s unassigned variables • Unit and unsat clauses can only be identified in linear time • Must visit all literals to confirm that clause is unit or unsat • When backtracking, do nothing • Knowledge of the order of literal assignments in clause is not (and cannot be) maintained

W W W W W W W W Unit clause @5 @1 @5 @3 @2 @4 @1 @3 @2 @5 @1 @5 @3 @2 @4 @5 @1 @3 @2 @4 W W W W @1 @3 @2 @4 @1 @3 @2 W W @1 @3 @2 @4 Backtracking Watched Literals

HT vs. WL • Head/Tail Lists: • Advantages: • Order relation between the two (H and T) references • More efficient identification of unit and unsat clauses • When one reference attempts to visit the other, clause is either unit or unsat • Better accuracy in characterizing the dynamic size of clauses • Disadvantages: • Larger overhead during backtracking • Worst-case number of references for each clause equals number of literals • Total (worst-case): |L| • Similar to adjacency lists in the worst-case

HT vs. WL • Watched Literals (WL): • Advantages: • Smaller overhead • Constant number (2) of references for each clause • Total (worst-case): 2|C| • Twice the number of clauses, and |C| << |L| • Disadvantages: • Lack of order relation between the two (W) references • Identification of new unit or unsat clauses is always linear in clause size • Worse accuracy in characterizing the dynamic size of clauses

Matrix vs. Lazy Data Structures • Matrix data structures: • Each clause is an array of bits • Lazy data structures: • Each clause is a list of literals • Matrix data structures: • Advantages: • Can identify not only unit clauses but also binary and ternary ones • Disadvantages: • It needs space also for not concrete literals • unit propagation is a |C| time method • backtrack is a |C| time method

Matrix vs. Lazy Data Structures • Lazy data structures: • Advantages: • Unit propagation is a |P| + |N| time method • |P|+|N| <= |C| • Disadvantages: • We don’t know the size of the clause, can identify only unit clauses

Data Structures for SAT Solvers The 2-Literal Representation

Data Structures for SAT Solvers The 2-Literal Representation

Presentation Transcript

Chapter 2: Data Representation

Data Representation, Data Structures, and Multi-file compilation

Chapter 2 Data Representation

SAT and CSP/CP Solvers [complete search]

Data Structures 2

Data Structures Part 2

SMT Solvers (an extension of SAT)

Proofs from SAT Solvers

Data Structures LAB 2

A Decision-Making Procedure for Resolution-Based SAT-solvers

SAT Solvers for Investigation of Architectures for Cognitive Information Processing

SAT Solvers

Local Restarts in SAT Solvers

Data Structures – Week #2

Massive Parallelization of SAT Solvers

Representation of Data Structures in OCAML

Lecture 2 Data Representation 2

Blooming Trees: Space-Efficient Structures for Data Representation

Chapter 2: Data Representation

Chapter 2: Data Representation

State-of-the-art in SAT solvers