700 likes | 714 Views
Learn about constraints on strings and how to use declarative methods to define patterns, operations, and constraints on strings. Explore regular expressions and their equivalence to finite-state automata.
E N D
Constraints on Strings 600.325/425 Declarative Methods - J. Eisner
What’s a constraint, again? A set ofallowedvalues A set ofallowedvalue pairs Y X Infinite sets? Sure … Infinite subsetsof (pairs of)integers, reals, … … unary binary How about soft constraints? 600.325/425 Declarative Methods - J. Eisner
What’s a constraint on strings? • Hard constraint: • Does string S match pattern P? (Is it in the set?) • A description of a set of strings • Like a constraint … how? • S is a variable whose domain is set of all strings! • So P can be regarded as a unary constraint: let’s write P(S). • Soft constraint: • How well does string S fit pattern P? • A function mapping each string to a score / weight / cost. • Like a soft constraint … 600.325/425 Declarative Methods - J. Eisner
What is a pattern? • What operations would you expect for combining these string constraints? • If P is a pattern, then so is ~P • ~P matches exactly the strings that P doesn’t • If P and Q are both patterns, then so is P & Q • If P and Q are both patterns, then so is P | Q • Wow, we can build up boolean formulas! • Does this allow us to encode SAT? • How? 600.325/425 Declarative Methods - J. Eisner
More about the relation to constraints • By building complicated patterns from simple ones, we are building up complicated constraints! • That is also allowed in ECLiPSe: • alldiff3(X,Y,Z) :- X #\= Y, Y \#= Z, X \#= Z. • between(X,Y,Z) :- X #< Y, Y #< Z. % either this • between(X,Y,Z) :- X #> Y, Y #> Z. % ... or this • Now we can use “alldiff3” and “between” as new constraints • Hang on, patterns are only unary constraints. Generalize? between(X,Y,Z) :- (X #< Y, Y #< Z) or (X #> Y, Y ># Z). 600.325/425 Declarative Methods - J. Eisner
What is a pattern? • Binary constraint (relation): • What are all the possible translations of string S? • A description of a set of string pairs(S,T) • Like a binary constraint: let’s write P(S,T) • We can also do n-ary constraints more generally, but most current solvers don’t allow them • Fuzzy case: How strongly is string S related to each T? Which one is it most strongly related to? • Ok, so what’s new here? Why does it matter that they’re string variables? 600.325/425 Declarative Methods - J. Eisner
Some Pattern Operators ~ complementation ~P &intersection P & Q |union P | Q concatenation PQ * iteration (0 or more) P* +iteration (1 or more) P+ - difference P - Q \ char complement \P (equiv. to ?-P) Which of these can be treated as syntactic sugar? That is, which of these can we get rid of? 600.325/425 Declarative Methods - J. Eisner
More Pattern Operators .x.crossproduct P .x. Q .o.composition P .o. Q .uupper (input) language P.u “domain” .l.lower (output) language P.l “range” 600.325/425 Declarative Methods - J. Eisner
The language of “regular expressions” • A variable S has infinitely many possible values if its type is “string” or “real” • So to specify a constraint on S, not enuf to list possible values • Language for simple constraints on reals: linear equations • Language for simple constraints on strings: regular expressions • Regular expression language • You probably know the standard form of regular expressions • Standard regexp is a unary constraint (“X must match a*b(c|d)*”) • Basic operators: union “|”, concatenation, closure “*” • But the language has been extended in various ways: • soft constraints (specifies costs) • binary constraints (over pairs of string variables) • n-ary constraints (over n string variables) 600.325/425 Declarative Methods - J. Eisner
Regular expressions finite-state automata • Given a regexp that specifies a constraint, you can build an FSA that efficiently determines whether a given string satisfies the constraint. • Given an FSA, you can find an equivalent regexp. • So the “compiled” form of the little language can be converted back to the source form. • Conclusion: Anything you can do with regexps, you can do with FSAs, and vice-versa. 600.325/425 Declarative Methods - J. Eisner
Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner
= Concatenation (of soft constraints) 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri
= Union + 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri
= Union + eps/0.8 eps/0 eps/0.3 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri
= Closure (also illustrates binary constraints) * why add new start state 4? why not just make state 0 final? 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri
Complementation • M represents a constraint on strings • We’d like to represent ~M(i.e., a constraint that says that the string must not be accepted by M) • Just change M’s final states to non-final and vice-versa • Only works if every string takes you to exactly one state in M (final or non-final). So M must be both deterministic and complete. Any M can be put in this form. 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri
2/0.5 2,2/1.3 2,0/0.8 2/0.8 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 = 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri
fat/0.5 2,0/0.8 2/0.8 2/0.5 2,2/1.3 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection = Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,00,11,12,0 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri
2/0.8 2/0.5 fat/0.7 0,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 = 0,0 Paths 00 and 01 both accept fat So must the new machine: along path 0,00,1 600.325/425 Declarative Methods - J. Eisner
2/0.8 2/0.5 pig/0.7 1,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 = 0,0 0,1 Paths 00 and 11 both accept pig So must the new machine: along path 0,11,1 600.325/425 Declarative Methods - J. Eisner
2/0.8 2/0.5 2,2/1.3 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 Paths 12 and 12 both accept fat So must the new machine: along path 1,12,2 600.325/425 Declarative Methods - J. Eisner
2,2/0.8 2/0.8 2/0.5 eats/0.6 2,0/1.3 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 sleeps/1.9 600.325/425 Declarative Methods - J. Eisner
Intersection • Why is intersection guaranteed to terminate? • How big a machine might be produced by intersection? 600.325/425 Declarative Methods - J. Eisner
Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner
Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) 600.325/425 Declarative Methods - J. Eisner
Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner
Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner
5 Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner
If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Let’s do a simpler variant first … Does there exist any path from initial state 1 to final state 5? 5 1 2 3 > 4 More generally, transitive closure problem: For each i, j, does there existany nontrivial path from i to j? 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
2 3 1 > 5 5 1 2 3 > 4 3 1 2 5 > Let’s do a simpler variant first … Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Does there exist any path from initial state 1 to final state 5? More generally, transitive closure problem: For each i, j, does there existany nontrivial path from i to j? 600.325/425 Declarative Methods - J. Eisner
5 1 2 3 > 3 1 2 5 > Let’s do a simpler variant first … If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #1:Gradually build up longer paths (length-1, length-2, length-3 …) • How do we deal with cycles? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • Both have O(n3) runtime. • But option #2 allows more flexible handling of cycles. We’ll need that when we return to our FSA problem. Bellman-Ford Floyd-Warshall 600.325/425 Declarative Methods - J. Eisner
Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • What are the paths of order 0? • What are the paths of order 1? • What are the paths of order 2? • How big can a path’s order be? • What are the paths of order 5? 3 1 2 5 > 600.325/425 Declarative Methods - J. Eisner
2 3 1 > 5 4 Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is a ij path of order k. • Define p0: For each i,j, set p0ij= true iff ⱻ an ij edge. • For k=1, 2, …n, define pk: 600.325/425 Declarative Methods - J. Eisner
New: but still uses only vertices numbered 1,…,k Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is an ij path of order k. • Define p0: For each i,j, set p0ij= true iff ⱻ ij edge. • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 v (pikk-1 ^ pkjk-1) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich
Floyd-Warshall Example v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=1 (computes p1 from p0) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=2 (computes p2 from p1) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=3 (computes p3 from p2) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=4 (computes p4 from p3) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=5 (computes p5 from p4) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=6 (computes p6 from p5) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Floyd-Warshall: k=7 (computes p7 from p6) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)
Regular expression version (Kleene/Tarjan) Find a regular expression describing all nontrivial paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner
5 Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner
New: but still uses only vertices numbered 1,…,k Regular expression version (Kleene/Tarjan) If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Definition: pkij= regular expression describing all ij paths that have order k. • Define p0: For each i,j, set p0ij= eij if ⱻ an edge ij, else . • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 | (pikk-1 pkkk-1* pkjk-1) (a regexp using all three of union, concat, closure!) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich
Regular expression version (Kleene/Tarjan) What if the arcs have labels? c b b a 5 1 2 3 > a aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner
Regular expression version (Kleene/Tarjan) What if the arcs have labels? Just substitute them in: c b b a 5 1 2 3 > a aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) b c a b a a aa c b a 600.325/425 Declarative Methods - J. Eisner
Instead of dimensions x2, y2, xy, etc.,every possible string is a dimensionand its coefficient is the coordinate (often 0) Regular languages as points in a high-dimensional space • abc abc • abc:2 2abc (weighted) • ab|ac ab + ac • a(b|c) ab + ac • a(b|(c:2)) ab + 2ac • ab* c ac + abc + abbc + abbbc + … • a(b:2)*c ac + 2abc + 4abbc +8abbbc + … 600.325/425 Declarative Methods - J. Eisner
Regular languages as points in a high-dimensional space • Suppose P, Q are two regular languages represented as these “formal power series.” • What is the sum P+Q? • Union! • We double-count … • What is the product PQ? • Concatenation! • What is the Hadamard product P Q? • (i.e., the dot product before you sum: x y = (x1y1, x2y2, …)) • Intersection! • What is 1/(1-P)? • * closure! • Could we use these techniques to classify strings using kernel SVMs? 600.325/425 Declarative Methods - J. Eisner
c c:z a a:x Unweighted e e:y c:z/.7 c/.7 a:x/.5 a/.5 Weighted .3 .3 e:y/.5 e/.5 Function from strings to ... Acceptors (FSAs) Transducers (FSTs) {false, true} strings numbers (string, num) pairs 600.325/425 Declarative Methods - J. Eisner