Constraints on Strings

Constraints on Strings 600.325/425 Declarative Methods - J. Eisner

What’s a constraint, again? A set ofallowedvalues A set ofallowedvalue pairs Y X Infinite sets? Sure … Infinite subsetsof (pairs of)integers, reals, … … unary binary How about soft constraints? 600.325/425 Declarative Methods - J. Eisner

What’s a constraint on strings? • Hard constraint: • Does string S match pattern P? (Is it in the set?) • A description of a set of strings • Like a constraint … how? • S is a variable whose domain is set of all strings! • So P can be regarded as a unary constraint: let’s write P(S). • Soft constraint: • How well does string S fit pattern P? • A function mapping each string to a score / weight / cost. • Like a soft constraint … 600.325/425 Declarative Methods - J. Eisner

What is a pattern? • What operations would you expect for combining these string constraints? • If P is a pattern, then so is ~P • ~P matches exactly the strings that P doesn’t • If P and Q are both patterns, then so is P & Q • If P and Q are both patterns, then so is P | Q • Wow, we can build up boolean formulas! • Does this allow us to encode SAT? • How? 600.325/425 Declarative Methods - J. Eisner

More about the relation to constraints • By building complicated patterns from simple ones, we are building up complicated constraints! • That is also allowed in ECLiPSe: • alldiff3(X,Y,Z) :- X #\= Y, Y \#= Z, X \#= Z. • between(X,Y,Z) :- X #< Y, Y #< Z. % either this • between(X,Y,Z) :- X #> Y, Y #> Z. % ... or this • Now we can use “alldiff3” and “between” as new constraints • Hang on, patterns are only unary constraints. Generalize? between(X,Y,Z) :- (X #< Y, Y #< Z) or (X #> Y, Y ># Z). 600.325/425 Declarative Methods - J. Eisner

What is a pattern? • Binary constraint (relation): • What are all the possible translations of string S? • A description of a set of string pairs(S,T) • Like a binary constraint: let’s write P(S,T) • We can also do n-ary constraints more generally, but most current solvers don’t allow them  • Fuzzy case: How strongly is string S related to each T? Which one is it most strongly related to? • Ok, so what’s new here? Why does it matter that they’re string variables? 600.325/425 Declarative Methods - J. Eisner

Some Pattern Operators ~ complementation ~P &intersection P & Q |union P | Q concatenation PQ * iteration (0 or more) P* +iteration (1 or more) P+ - difference P - Q \ char complement \P (equiv. to ?-P) Which of these can be treated as syntactic sugar? That is, which of these can we get rid of? 600.325/425 Declarative Methods - J. Eisner

More Pattern Operators .x.crossproduct P .x. Q .o.composition P .o. Q .uupper (input) language P.u “domain” .l.lower (output) language P.l “range” 600.325/425 Declarative Methods - J. Eisner

The language of “regular expressions” • A variable S has infinitely many possible values if its type is “string” or “real” • So to specify a constraint on S, not enuf to list possible values • Language for simple constraints on reals: linear equations • Language for simple constraints on strings: regular expressions • Regular expression language • You probably know the standard form of regular expressions • Standard regexp is a unary constraint (“X must match a*b(c|d)*”) • Basic operators: union “|”, concatenation, closure “*” • But the language has been extended in various ways: • soft constraints (specifies costs) • binary constraints (over pairs of string variables) • n-ary constraints (over n string variables) 600.325/425 Declarative Methods - J. Eisner

Regular expressions  finite-state automata • Given a regexp that specifies a constraint, you can build an FSA that efficiently determines whether a given string satisfies the constraint. • Given an FSA, you can find an equivalent regexp. • So the “compiled” form of the little language can be converted back to the source form. • Conclusion: Anything you can do with regexps, you can do with FSAs, and vice-versa. 600.325/425 Declarative Methods - J. Eisner

Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner

= Concatenation (of soft constraints) 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

= Union + 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

= Union + eps/0.8 eps/0 eps/0.3 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

= Closure (also illustrates binary constraints) * why add new start state 4? why not just make state 0 final? 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

Complementation • M represents a constraint on strings • We’d like to represent ~M(i.e., a constraint that says that the string must not be accepted by M) • Just change M’s final states to non-final and vice-versa • Only works if every string takes you to exactly one state in M (final or non-final). So M must be both deterministic and complete. Any M can be put in this form. 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

2/0.5 2,2/1.3 2,0/0.8 2/0.8 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 = 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri

fat/0.5 2,0/0.8 2/0.8 2/0.5 2,2/1.3 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection = Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,00,11,12,0 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri

2/0.8 2/0.5 fat/0.7 0,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 = 0,0 Paths 00 and 01 both accept fat So must the new machine: along path 0,00,1 600.325/425 Declarative Methods - J. Eisner

2/0.8 2/0.5 pig/0.7 1,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 = 0,0 0,1 Paths 00 and 11 both accept pig So must the new machine: along path 0,11,1 600.325/425 Declarative Methods - J. Eisner

2/0.8 2/0.5 2,2/1.3 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 Paths 12 and 12 both accept fat So must the new machine: along path 1,12,2 600.325/425 Declarative Methods - J. Eisner

2,2/0.8 2/0.8 2/0.5 eats/0.6 2,0/1.3 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 sleeps/1.9 600.325/425 Declarative Methods - J. Eisner

Intersection • Why is intersection guaranteed to terminate? • How big a machine might be produced by intersection? 600.325/425 Declarative Methods - J. Eisner

Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner

Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner

5 Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner

If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Let’s do a simpler variant first … Does there exist any path from initial state 1 to final state 5? 5 1 2 3 > 4 More generally, transitive closure problem: For each A, B, does there exist any pathfrom A to B? 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

2 3 1 > 5 5 1 2 3 > 4 3 1 2 5 > Let’s do a simpler variant first … Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Does there exist any path from initial state 1 to final state 5? More generally, transitive closure problem: For each A, B, does there exist any pathfrom A to B? 600.325/425 Declarative Methods - J. Eisner

5 1 2 3 > 3 1 2 5 > Let’s do a simpler variant first … If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #1:Gradually build up longer paths (length-1, length-2, length-3 …) • How do we deal with cycles? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • Both have O(n3) runtime. • But option #2 allows more flexible handling of cycles. We’ll need that when we return to our FSA problem. 600.325/425 Declarative Methods - J. Eisner

Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • What are the paths of order 0? • What are the paths of order 1? • What are the paths of order 2? • How big can a path’s order be? • What are the paths of order 5? 3 1 2 5 > 600.325/425 Declarative Methods - J. Eisner

2 3 1 > 5 4 Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is an ij path of order  k. • Define p0: For each i,j, set p0ij= true iff there is an ij edge. • For k=1, 2, …n, define pk: 600.325/425 Declarative Methods - J. Eisner

New: but still uses only vertices numbered 1,…,k Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is an ij path of order  k. • Define p0: For each i,j, set p0ij= true iff there is an ij edge. • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 v (pikk-1 ^ pkjk-1) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich

Floyd-Warshall Example v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

Floyd-Warshall: k=1 (computes p1 from p0) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner

5 Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner

New: but still uses only vertices numbered 1,…,k Regular expression version (Kleene/Tarjan) If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Definition: pkij= regular expression describing all ij paths that have order  k. • Define p0: For each i,j, set p0ij= eij if that edge exists, else . • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 | (pikk-1 pkkk-1* pkjk-1) (a regexp using all three of union, concat, closure!) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich

Regular expression version (Kleene/Tarjan) What if the arcs have labels? c b b a 5 1 2 3 >  a  aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

Regular expression version (Kleene/Tarjan) What if the arcs have labels? Just substitute them in: c b b a 5 1 2 3 >  a  aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45)  b c a  b a a  aa c   b a 600.325/425 Declarative Methods - J. Eisner

Instead of dimensions x2, y2, xy, etc.,every possible string is a dimensionand its coefficient is the coordinate (often 0) Regular languages as points in a high-dimensional space • abc  abc • abc:2  2abc (weighted) • ab|ac  ab + ac • a(b|c)  ab + ac • a(b|(c:2))  ab + 2ac • ab* c  ac + abc + abbc + abbbc + … • a(b:2)*c  ac + 2abc + 4abbc +8abbbc + … 600.325/425 Declarative Methods - J. Eisner

Regular languages as points in a high-dimensional space • Suppose P, Q are two regular languages represented as these “formal power series.” • What is the sum P+Q? • Union! • We double-count … • What is the product PQ? • Concatenation! • What is the Hadamard product P Q? • (i.e., the dot product before you sum: x  y = (x1y1, x2y2, …)) • Intersection! • What is 1/(1-P)? • * closure! • Could we use these techniques to classify strings using kernel SVMs? 600.325/425 Declarative Methods - J. Eisner

c c:z a a:x Unweighted e e:y c:z/.7 c/.7 a:x/.5 a/.5 Weighted .3 .3 e:y/.5 e/.5 Function from strings to ... Acceptors (FSAs) Transducers (FSTs) {false, true} strings numbers (string, num) pairs 600.325/425 Declarative Methods - J. Eisner

Constraints on Strings

Constraints on Strings

Presentation Transcript

Constraints on Hypercomputation

A Model Counter For Constraints Over Unbounded Strings

Strings

HV constraints on LVPS

Strings

Constraints on absolute power

Strings

Exercises on Structures and Strings

Constraints on multivariate evolution

Module on Constraints

Module on Constraints

Operations on RNA Strings

Constraints on Relations

21cm Constraints on Reionization

CMB Constraints on Cosmology

Strings

Strings

Strings

Constraints on Dissipative Processes

Constraints on ttH analysis