1 / 70

Constraints on Strings

Learn about constraints on strings and how to use declarative methods to define patterns, operations, and constraints on strings. Explore regular expressions and their equivalence to finite-state automata.

gjohnny
Download Presentation

Constraints on Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constraints on Strings 600.325/425 Declarative Methods - J. Eisner

  2. What’s a constraint, again? A set ofallowedvalues A set ofallowedvalue pairs Y X Infinite sets? Sure … Infinite subsetsof (pairs of)integers, reals, … … unary binary How about soft constraints? 600.325/425 Declarative Methods - J. Eisner

  3. What’s a constraint on strings? • Hard constraint: • Does string S match pattern P? (Is it in the set?) • A description of a set of strings • Like a constraint … how? • S is a variable whose domain is set of all strings! • So P can be regarded as a unary constraint: let’s write P(S). • Soft constraint: • How well does string S fit pattern P? • A function mapping each string to a score / weight / cost. • Like a soft constraint … 600.325/425 Declarative Methods - J. Eisner

  4. What is a pattern? • What operations would you expect for combining these string constraints? • If P is a pattern, then so is ~P • ~P matches exactly the strings that P doesn’t • If P and Q are both patterns, then so is P & Q • If P and Q are both patterns, then so is P | Q • Wow, we can build up boolean formulas! • Does this allow us to encode SAT? • How? 600.325/425 Declarative Methods - J. Eisner

  5. More about the relation to constraints • By building complicated patterns from simple ones, we are building up complicated constraints! • That is also allowed in ECLiPSe: • alldiff3(X,Y,Z) :- X #\= Y, Y \#= Z, X \#= Z. • between(X,Y,Z) :- X #< Y, Y #< Z. % either this • between(X,Y,Z) :- X #> Y, Y #> Z. % ... or this • Now we can use “alldiff3” and “between” as new constraints • Hang on, patterns are only unary constraints. Generalize? between(X,Y,Z) :- (X #< Y, Y #< Z) or (X #> Y, Y ># Z). 600.325/425 Declarative Methods - J. Eisner

  6. What is a pattern? • Binary constraint (relation): • What are all the possible translations of string S? • A description of a set of string pairs(S,T) • Like a binary constraint: let’s write P(S,T) • We can also do n-ary constraints more generally, but most current solvers don’t allow them  • Fuzzy case: How strongly is string S related to each T? Which one is it most strongly related to? • Ok, so what’s new here? Why does it matter that they’re string variables? 600.325/425 Declarative Methods - J. Eisner

  7. Some Pattern Operators ~ complementation ~P &intersection P & Q |union P | Q concatenation PQ * iteration (0 or more) P* +iteration (1 or more) P+ - difference P - Q \ char complement \P (equiv. to ?-P) Which of these can be treated as syntactic sugar? That is, which of these can we get rid of? 600.325/425 Declarative Methods - J. Eisner

  8. More Pattern Operators .x.crossproduct P .x. Q .o.composition P .o. Q .uupper (input) language P.u “domain” .l.lower (output) language P.l “range” 600.325/425 Declarative Methods - J. Eisner

  9. The language of “regular expressions” • A variable S has infinitely many possible values if its type is “string” or “real” • So to specify a constraint on S, not enuf to list possible values • Language for simple constraints on reals: linear equations • Language for simple constraints on strings: regular expressions • Regular expression language • You probably know the standard form of regular expressions • Standard regexp is a unary constraint (“X must match a*b(c|d)*”) • Basic operators: union “|”, concatenation, closure “*” • But the language has been extended in various ways: • soft constraints (specifies costs) • binary constraints (over pairs of string variables) • n-ary constraints (over n string variables) 600.325/425 Declarative Methods - J. Eisner

  10. Regular expressions  finite-state automata • Given a regexp that specifies a constraint, you can build an FSA that efficiently determines whether a given string satisfies the constraint. • Given an FSA, you can find an equivalent regexp. • So the “compiled” form of the little language can be converted back to the source form. • Conclusion: Anything you can do with regexps, you can do with FSAs, and vice-versa. 600.325/425 Declarative Methods - J. Eisner

  11. Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner

  12. = Concatenation (of soft constraints) 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

  13. = Union + 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

  14. = Union + eps/0.8 eps/0 eps/0.3 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

  15. = Closure (also illustrates binary constraints) * why add new start state 4? why not just make state 0 final? 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

  16. Complementation • M represents a constraint on strings • We’d like to represent ~M(i.e., a constraint that says that the string must not be accepted by M) • Just change M’s final states to non-final and vice-versa • Only works if every string takes you to exactly one state in M (final or non-final). So M must be both deterministic and complete. Any M can be put in this form. 600.325/425 Declarative Methods - J. Eisner example thanks to M. Mohri

  17. 2/0.5 2,2/1.3 2,0/0.8 2/0.8 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 = 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri

  18. fat/0.5 2,0/0.8 2/0.8 2/0.5 2,2/1.3 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection = Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,00,11,12,0 600.325/425 Declarative Methods - J. Eisner example adapted from M. Mohri

  19. 2/0.8 2/0.5 fat/0.7 0,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 = 0,0 Paths 00 and 01 both accept fat So must the new machine: along path 0,00,1 600.325/425 Declarative Methods - J. Eisner

  20. 2/0.8 2/0.5 pig/0.7 1,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 = 0,0 0,1 Paths 00 and 11 both accept pig So must the new machine: along path 0,11,1 600.325/425 Declarative Methods - J. Eisner

  21. 2/0.8 2/0.5 2,2/1.3 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 Paths 12 and 12 both accept fat So must the new machine: along path 1,12,2 600.325/425 Declarative Methods - J. Eisner

  22. 2,2/0.8 2/0.8 2/0.5 eats/0.6 2,0/1.3 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 sleeps/1.9 600.325/425 Declarative Methods - J. Eisner

  23. Intersection • Why is intersection guaranteed to terminate? • How big a machine might be produced by intersection? 600.325/425 Declarative Methods - J. Eisner

  24. Given a regular expression … • Make a parse tree for it • Build up the FSA from the bottom up • Example: (ab|c)*(bb*a) concat concat closure union a concat c b concat closure b a b 600.325/425 Declarative Methods - J. Eisner

  25. Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

  26. Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* e35) | e24 e45) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

  27. Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner

  28. 5 Given an FSA … Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner

  29. If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Let’s do a simpler variant first … Does there exist any path from initial state 1 to final state 5? 5 1 2 3 > 4 More generally, transitive closure problem: For each i, j, does there existany nontrivial path from i to j? 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  30. 2 3 1 > 5 5 1 2 3 > 4 3 1 2 5 > Let’s do a simpler variant first … Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Does there exist any path from initial state 1 to final state 5? More generally, transitive closure problem: For each i, j, does there existany nontrivial path from i to j? 600.325/425 Declarative Methods - J. Eisner

  31. 5 1 2 3 > 3 1 2 5 > Let’s do a simpler variant first … If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #1:Gradually build up longer paths (length-1, length-2, length-3 …) • How do we deal with cycles? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • Both have O(n3) runtime. • But option #2 allows more flexible handling of cycles. We’ll need that when we return to our FSA problem. Bellman-Ford Floyd-Warshall 600.325/425 Declarative Methods - J. Eisner

  32. Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Hmm … should I look for a 13 path first in hopes of using it to build a 15 path? Or vice-versa? • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. • What are the paths of order 0? • What are the paths of order 1? • What are the paths of order 2? • How big can a path’s order be? • What are the paths of order 5? 3 1 2 5 > 600.325/425 Declarative Methods - J. Eisner

  33. 2 3 1 > 5 4 Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is a ij path of order  k. • Define p0: For each i,j, set p0ij= true iff ⱻ an ij edge. • For k=1, 2, …n, define pk: 600.325/425 Declarative Methods - J. Eisner

  34. New: but still uses only vertices numbered 1,…,k Floyd-Warshall transitive closure algorithm If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. • Option #2 (less obvious):Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: pkij= true iff there is an ij path of order  k. • Define p0: For each i,j, set p0ij= true iff ⱻ ij edge. • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 v (pikk-1 ^ pkjk-1) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich

  35. Floyd-Warshall Example v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  36. Floyd-Warshall: k=1 (computes p1 from p0) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  37. Floyd-Warshall: k=2 (computes p2 from p1) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  38. Floyd-Warshall: k=3 (computes p3 from p2) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  39. Floyd-Warshall: k=4 (computes p4 from p3) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  40. Floyd-Warshall: k=5 (computes p5 from p4) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  41. Floyd-Warshall: k=6 (computes p6 from p5) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  42. Floyd-Warshall: k=7 (computes p7 from p6) v 4 v v 2 6 v 3 v 1 v 5 600.325/425 Declarative Methods - J. Eisner slide thanks to R. Tamassia & M. Goodrich (modified)

  43. Regular expression version (Kleene/Tarjan) Find a regular expression describing all nontrivial paths from initial state 1 to final state 5. 5 1 2 3 > 4 Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) 600.325/425 Declarative Methods - J. Eisner

  44. 5 Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state 5. 2 3 1 > 4 Paths from 1 to 5: ??? 600.325/425 Declarative Methods - J. Eisner

  45. New: but still uses only vertices numbered 1,…,k Regular expression version (Kleene/Tarjan) If there’s a way to get from 1 to 3 and from3 to 5, then there's a way to get from 1 to 5. Definition: pkij= regular expression describing all ij paths that have order  k. • Define p0: For each i,j, set p0ij= eij if ⱻ an edge ij, else . • For k=1, 2, …n, define pk: • For each i,j, set pijk = pijk-1 | (pikk-1 pkkk-1* pkjk-1) (a regexp using all three of union, concat, closure!) • return pn(e.g., what is pn1n ?) i Uses only vertices numbered 1,…,k-1 j Uses onlyvertices numbered 1,…,k-1 k 600.325/425 Declarative Methods - J. Eisner parts of slide thanks to R. Tamassia & M. Goodrich

  46. Regular expression version (Kleene/Tarjan) What if the arcs have labels? c b b a 5 1 2 3 >  a  aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45) 600.325/425 Declarative Methods - J. Eisner

  47. Regular expression version (Kleene/Tarjan) What if the arcs have labels? Just substitute them in: c b b a 5 1 2 3 >  a  aa 4 Paths from 1 to 5: e12 ( (e23 (e33 | e34 e43 )* (e35 | e34 e45)) | (e24 (e43 e33* e34 )* (e45 | e43 e35))) Paths from 1 to 5: e12 ((e23 e33* (e35 | e34 e45)) | e24 e45)  b c a  b a a  aa c   b a 600.325/425 Declarative Methods - J. Eisner

  48. Instead of dimensions x2, y2, xy, etc.,every possible string is a dimensionand its coefficient is the coordinate (often 0) Regular languages as points in a high-dimensional space • abc  abc • abc:2  2abc (weighted) • ab|ac  ab + ac • a(b|c)  ab + ac • a(b|(c:2))  ab + 2ac • ab* c  ac + abc + abbc + abbbc + … • a(b:2)*c  ac + 2abc + 4abbc +8abbbc + … 600.325/425 Declarative Methods - J. Eisner

  49. Regular languages as points in a high-dimensional space • Suppose P, Q are two regular languages represented as these “formal power series.” • What is the sum P+Q? • Union! • We double-count … • What is the product PQ? • Concatenation! • What is the Hadamard product P Q? • (i.e., the dot product before you sum: x  y = (x1y1, x2y2, …)) • Intersection! • What is 1/(1-P)? • * closure! • Could we use these techniques to classify strings using kernel SVMs? 600.325/425 Declarative Methods - J. Eisner

  50. c c:z a a:x Unweighted e e:y c:z/.7 c/.7 a:x/.5 a/.5 Weighted .3 .3 e:y/.5 e/.5 Function from strings to ... Acceptors (FSAs) Transducers (FSTs) {false, true} strings numbers (string, num) pairs 600.325/425 Declarative Methods - J. Eisner

More Related