1 / 25

Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)

Typed and Unambiguous Pattern Matching on Strings using Regular Expressions. Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University). DANSAS 2010 (In proc. of PPDP 2010). [http://xkcd.com/208/]. Main Message. For regular expressions : Pattern matching

reba
Download Presentation

Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Typed and UnambiguousPatternMatchingonStringsusingRegularExpressions Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University) DANSAS 2010 (In proc. of PPDP 2010) [http://xkcd.com/208/]

  2. Main Message For regularexpressions: • Patternmatching • Precisesyntax-directedambiguityanalysis • Typedmappinginto a targetlanguage

  3. Introduction & Motivation • Parsing dynamic input is an ubiquitous problem • URLs: • Log Files: • The solution is patternmatching (list ofkey-value pairs) http://www.cs.au.dk/index.php?id=141&view=details protocol host path query-string 13/02/2010 66.249.65.107 get /support.html 20/02/2010 42.116.32.64 post /search.html

  4. Example • Example (date): • Matchingagainststring: • yields: <day= [0-9]{1,2} > "/" <month= [0-9]{1,2} > "/" <year= [0-9]{4}> [0-9]{1,2} "/" [0-9]{1,2}"/" [0-9]{4} "26/06/1992" day = 26 month = 06 year = 1992

  5. Example • Example (date): • String2082010: • day = 2 and month = 08 (ie. 2nd of August) • day = 20 and month = 8 (ie. 20th of August) <day= [0-9]{1,2} > <month= [0-9]{1,2} > <year= [0-9]{4} > <day= [0-9]{1,2} > "/" <month= [0-9]{1,2} > "/" <year= [0-9]{4} >

  6. Whyregularexpressions? • Expressive (enough) • Declarative • Decidableproperties • Wellknown

  7. Outline • Oursetup • RegularExpressions: • The Recording Construction • Ambiguity: • Disambiguation • TypeMapping • Conclusion .

  8. Our setup url.rex Compile (our tool) <URL = [a-z]*>; ... URL.java ... Foo.java URL.java Foo.java ... import URL; class Foo { ... } Compile (javac) URL.class Foo.class ...

  9. Outline • Oursetup • RegularExpressions: • The Recording Construction • Ambiguity: • Disambiguation • TypeMapping • Conclusion

  10. RegularExpressions • Syntax: • Semantics: where: • L1 L2 is concatenation(i.e., { 1 2 | 1L1, 2L2 }) • L* = i0 Liwhere L0 = {  } and Li = L  Li-1 • Usualextensions : • Anycharacter ”.” asc1|c2|...|cn, ci • Character ranges ”[a-z]” asa|b|...|z • Repetitions ”R{2,3}” asRR|RRR

  11. Recording • Syntax: • ” ” is a recordingidentifier • (it "remembers" the substring it matches) • Semantics: • Example(simplifiedemails): • Matchingagainststring: yields: <user=><domain=> [a-z]+ "@" [a-z]+ ("." [a-z]+)* "obama@whitehouse.gov" Related: "x as R" in XDuce; "x::R" in CDuce; and "x@R" in Scala and HaRP domain = "whitehouse.gov" user = "obama" &

  12. Outline • Oursetup • RegularExpressions: • The Recording Construction • Ambiguity: • Disambiguation • TypeMapping • Conclusion

  13. Ambiguity • Example from before • matched on the string “208” gives rise to: • day = 2 and month = 08 (ie. 2nd of August) • day = 20 and month = 8 (ie. 20th of August) • Multiple ways of matching => ambiguous • Problem: Concatenation <day= [0-9]{1,2} > <month= [0-9]{1,2} > 2 0 8 day month

  14. Ambiguityanalysis NB: sound & complete ! • Theorem: • Runambiguousiff Relatedwork: [Brabrand+Giegerich+Møller’09]: Similar approach for context free grammars. [Book+Even+Greibach+Ott'71] and [Hosoya'03] for XDucebut indirectly via NFA, not directly (syntax-directed).

  15. Outline • Oursetup • RegularExpressions: • The Recording Construction • Ambiguity: • Disambiguation • Typemapping • Conclusion

  16. 2) Restriction: R1 - R2 L(R1 - R2) = L(R1) \ L(R2) 4)Default disambiguation: concat, choice, and star are all left-biased(by default) ! (Ourtooldoesthis) 1)Manual rewriting: Alwayspossible:-) Tedious :-( Error-prone :-( Not structure-preserving :-( 3)Disambiguators: Threebasic operators choice: '|L','|R' concat: 'L','R' star:'*L','*R' Disambiguation <foo= a > | <bar = a* > <foo= a > | <bar = a* > is rewritten to using restriction <foo= a > | <bar =|aaa* > <foo= a > | <bar =a*-a> <foo= a > | <bar = a* > <foo= a > | <bar = a* > no need to rewrite using restriction we get • Relatedwork: [Vansummeren'06] but with global, not localdisambiguation <foo= a > |L <bar = a* >

  17. Outline • Oursetup • RegularExpressions: • The Recording Construction • Ambiguity: • Disambiguation • TypeMapping • Conclusion

  18. Type Mapping • Our date example • Type of the recordings day, month, and year? • Strings (=> many type casts) • Infer the type <day= [0-9]{2} > "/" <month= [0-9]{2} > "/” <year= [0-9]{4} >

  19. Type Mapping • A recording has three type components: • a linguistic type (language of the recording - maps to String, int, float, etc). • a structural type (nested recordings – maps to (nested) classes). • a type modifier (maps to lists). • Relatedwork: Exact type inference in XDuce & CDuce(soundness+completenessproof in [Vansummeren'06])but not for stand-alone and non-intrusiveusage (Java)

  20. Type Mapping [0-9]+ [a-z]+ Person = <name=>" (" <age=>")" • Example class Person { // auto-generated Stringname; intage; static Person match(String s) { ... } public StringtoString() { ... } } compile (ourtool) • Usage String s = "obama (48)"; Person p = Person.match(s); print(p.name + " is " + p.age + "y old");

  21. Conclusion Regularexpressionsarealive and well. Thispaper: • Used for patternmatching • Preciseambiguityanalysis • Type mapping Future work: improve performance, subtype of recordings "trade(excess) expressivity for safety+simplicity” Thankyou. Questions?

  22. Abstract Syntax Trees (ASTs)

  23. R R'  T T'   = Ambiguity • Definition: • Rambiguousiff T,T'ASTR: T  T'  ||T|| = ||T'|| • where ||||: AST * (the flattening) is:

  24. Characterization of Ambiguity NB: sound & complete ! • Theorem: • Runambiguousiff R* =  | RR*

  25. Type Inference • Type Inference: • R:(L,S)

More Related