1 / 10

Notes on Python Regular Expressions and parser generators (by D. Parson)

Notes on Python Regular Expressions and parser generators (by D. Parson). These are the Python supplements to the author’s slides for Chapter 1 and Section 2.1.

thane-glenn
Download Presentation

Notes on Python Regular Expressions and parser generators (by D. Parson)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notes on Python Regular Expressions and parser generators (by D. Parson) • These are the Python supplements to the author’s slides for Chapter 1 and Section 2.1. • http://faculty.kutztown.edu/parson/spring2010/CSC310Spring2010.html has a link to the author’s slides, which are password protected by your K.U. Windows login / password used to access your student account.

  2. Regular Expressions in Python • re module in the optional Python text. • http://docs.python.org/library/re.html • A RE is a pattern in the form of a string. • compile(pattern [, flags]) compiles an RE expression into a finite automaton object. • Return value can be used by other functions. • Flags are for case, multiline, and meta-character options. • search(pattern, string [, flags) searches string for the first match of pattern. • match(pattern, string [, flags) checks at string’s beginning. • Both return a MatchObject or None.

  3. Regular Expressions in Python • split(pattern, string [, maxsplit = 0]) splits string into occurrences of pattern. • Returns a list of strings • sub(pattern, repl, string [, count = 0]) performs substitutions of repl for pattern occurrences. • String and sequence operations are related. • http://docs.python.org/library/string.html • >>> s = "abcde" • >>> dir(s) • ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

  4. Python Regular Expression Examples • >>> m1 = search('a+z*(b.d)', 'abcdefghi') • >>> m1 • <_sre.SRE_Match object at 0x11c520> • >>> m1.groups() • ('bcd',) • >>> m1.start() • 0 • >>> m1.end() • 4 • >>> m1.start(0) • 0 • >>> m1.start(1) • 1 # Group 0 is the entire match, 1 is the first parenthesized subexpression, etc.

  5. Learn the major Meta-characters! • Text – verbatim text • . – any character except newline • ^ – matches start of the string (anchor) • $ – matches end of the string • * – Kleene start, 0 or more subpattern repetitions • + – Kleene plus, 1 or more subpattern repetitions • ? – optional, 0 or 1 subpattern occurrence • | – alternation, either left or right subpattern • () – group a subexpression inside parentheses • \ – escape a meta-character (make it normal) • [set of chars], [^set of chars not matched]

  6. More Python RE Examples • >>> m2 = search('a+z*(b.d)', 'Abcde') • >>> m2 • >>> print m2 • None • >>> split(':', "abc:cd:e:f") • ['abc', 'cd', 'e', 'f'] • >>> split('[:]', "abc:cd:e:f") • ['abc', 'cd', 'e', 'f'] • >>> split('[^:]', "abc:cd:e:f") • ['', '', '', ':', '', ':', ':', '']

  7. More Python RE Examples (sub) • >>> sub('a([^b]+)b', 'A\\1B', 'a123b45ab67a9b aab') • 'A123B45ab67A9B AaB' • The parenthesized subexpression matches one or more occurrences of anything except for b. • The matched substring of the first parenthesized subexpression is group 1. • The replacement pattern \1 says “insert group 1 at this point.” • Effect is to re-insert characters between a and b.

  8. Finite State Automata • A regular expression compiler translates a regular expression into a finite state automaton. • This could be a linked data structure or code. It looks like a graph of mapping steps needed for the regular expression. • There are nondeterministic and deterministic flavors. • (a|b)c+d is a simple example expression. c c a s1 ε start s3 s4 accept b c d s2

  9. Lookahead 1 types of parsers. • LL(1) and LR(1) grammars require a parser to get at most 1 look-ahead terminal from the scanner. • LL(1) cannot handle left-recursive grammar productions. It can handle other recursion. • LR(1) and its variants can handle left, right and nested recursion; left is the most efficient. • A generated parser is essentially a deterministic finite state automaton that uses a stack to keep track of nested syntactic structures. • This topic is covered exhaustively in compiler design.

  10. Parser generators in Python. • YAPPS2 is an LL(1) parser generator. • http://theory.stanford.edu/~amitp/yapps/ • http://pypi.python.org/pypi/Yapps2 • PLY is a Python LALR(1) (subset of LR(1)) equivalent to UNIX YACC and GNU Bison that are used to generate compilers for C code. • http://www.dabeaz.com/ply/ • Both generate Python executable parsers from stylized Python code.

More Related