1 / 23

Functional Design and Programming

Functional Design and Programming. Lecture 9: Lexical analysis and parsing. Literature. Paulson, chap. 9: Lexical analysis (9.1) Functional parsing (9.2-9.4). Exercises. Paulson, chap. 9: 9.1-9.2 9.3-9.6, 9.8 Write a parser for XML elements (see home page). Parsing/Unparsing.

ajaxe
Download Presentation

Functional Design and Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Design and Programming Lecture 9: Lexical analysis and parsing

  2. Literature • Paulson, chap. 9: • Lexical analysis (9.1) • Functional parsing (9.2-9.4)

  3. Exercises • Paulson, chap. 9: • 9.1-9.2 • 9.3-9.6, 9.8 • Write a parser for XML elements (see home page) .

  4. Parsing/Unparsing • Purpose: Encoding/decoding structured data into flat (string) representations • Reasons: • Data read (and written) using operating system routines (“read 25 bytes from file XYZ”). • Need for universal format for all kinds of data; e.g., to allow editing with text editor.

  5. scanner parser element stag etag transformer(s) contents “H1” “ My title” “H1” .... ... “MY TITLE” unparser Language processor architecture character stream “<H1 > My title</ H1>” [LANGLE, ID “H1”, RANGLE, ID “ My title”, LSLASH, ID “ H1”, RANGLE] token stream abstract syntax tree abstract syntax tree “<H1> MY TITLE </H1>” character stream

  6. Lexical analysis (Scanning, lexing, tokenizing) • Purpose: Turning a character stream into a stream of tokens. • Reasons: • Making parsing easier by taking care of ‘low-level’ concerns such as eliminating whitespace. • Efficient preprocessing and compression of input to parser. • Unbounded lookahead into input stream (in contrast to most parsers) • Well-founded theoretical basis and tool support (regular expressions and finite state machines).

  7. Context-free Grammars (CFGs) • A context-free grammar G describes a language (set of strings) • G = (T, N, P, S) where • T: set of terminal symbols • N: set of nonterminal symbols • P: set of productions • S: start symbol (a particular nonterminal symbol)

  8. CFGs: Example T = { +, -, *, /, (, ), Var, Const } N = { Exp, Term, Factor } S = Exp Exp ::= Exp + Term | Exp - Term | Term Term :: = Term * Factor | Term / Factor | Factor Factor ::= Var | Const | ( Exp )

  9. CFG’s: Example... Exp Exp Exp Term Term Term Term Term Factor Factor Factor Factor Factor [Var, +, Var, /, Const, -, Var, *, Var] “x + y / 15 - x * x”

  10. Parsing • Purpose: Turning a stream of tokens into a tree structure expressed by grammar • Reasons: • Checking that input is well-formed (according to given grammar) • Producing parse tree or abstract syntax tree to recover tree structure in input • Processing parse tree according to grammar

  11. Parsing combinators • Idea: For each terminal or nonterminal M there is a function: • fM : token list -> T * token list (= T phrase) • such that fM takes elements from its argument until it has reduced the elements to M • and then produces a value of type T for it.

  12. Parsing primitives • Terminals: • Var: string phrase • Const: int phrase • $: string -> string phrase (for keywords)

  13. Parsing primitives... • Parsing combinators: • empty: (‘a list) phrase • ||: ‘a phrase * ‘a phrase -> ‘a phrase • --: ‘a phrase * ‘b phrase -> (‘a * ‘b) phrase • >>: ‘a phrase * (‘a -> ‘b) -> ‘b phrase • Derived combinators: • repeat: ‘a phrase -> ‘a list phrase • $--: ‘a phrase * ‘b phrase -> ‘b phrase • --$: ‘a phrase * ‘b phrase -> ‘a phrase

  14. Parsing precedences infix 6 $-- --$ infix 5 -- infix 3 >> infix 0 ||

  15. Problems with combinatory parsers • Left-recursion: • Problem: Left-recursive grammars make parsers go into an infinite loop. • Remedy: Transform grammar to eliminate left-recursion • Mutual recursion: • Problem (SML-specific!): Cannot use val-declaration and combinator applications only. • Remedy: Use fun-declarations for mutually recursive parts of a grammar

  16. Parsing problems... Example grammar is left-recursive: Exp ::= Exp ‘+’ Term | Exp ‘-’ Term | Term Term :: = Term ‘*’ Factor | Term ‘/’ Factor | Factor Factor ::= Var | Const | ‘(’ Exp ‘)’ Eliminate left-recursion: Binop1 ::= ‘+’ | ‘-’ Binop2 ::= ‘*’ | ‘/’ Factor ::= Var | Const | ‘(’ Exp ‘)’ Term ::= Factor (Binop2 Factor)* Exp ::= Term (Binop1 Term)*

  17. Data type for abstract syntax trees type binop = string datatype expAST = EXP of termAST * (binop * termAST) list and termAST = TERM of factorAST * (binop * factorAST) list and factorAST = VAR of string | CONST of int | PARENEXP of expAST

  18. Parser: example (first try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” val factor = Var >> VAR || Const >> CONST o Int.fromString || $”(” $-- exp --$ $”)” >> PARENEXP val term = factor -– repeat (binop2 -- factor) >> TERM val exp = term –- repeat (binop1 term) >> EXP PROBLEM: Doesn’t work! These definitions are intended to be mutually recursive, but are not!

  19. Parser: example (second try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” fun factor toks = ( Var >> VAR || Const >> CONST || $”(” $-- exp --$ $”)” ) toks and term toks = (factor -– repeat (binop2 -- factor)) toks and exp toks = (term -– repeat (binop1 term)) toks

  20. Operator precedence parsing (overview) • When processing operator expressions, a parser has to decide whether to reduce (stop the current phrase parser and return its result) or shift (continue the current phrase parse) • Operator precedence parsing: Associate a precedence (binding strength) with each operator, remember the the precedence of the last operator processed and determine whether to reduce or shift depending on the precedence of the next operator. • See Paulson, pp. 364-366

  21. Backtracking parsing (overview) • There may be more than one of parsing an expression. • Backtracking parsing: Construct a lazy list of all possible parses of a token stream. Continue parse with first of those and find a complete parse for the whole token stream; if that fails, backtrack to second in the list and repeat. • See Paulson, pp. 366-367

  22. Recursive-descent parsing (overview) • Write one parser for each grammatical category (as in combinatory parsing) • Process token stream as in combinatory parsers, excepting alternatives. • Process alternatives as follows: • Look at next token (first token of remaining token stream). • Choose phrase parser on the basis of that token.

  23. LL-parsing and LR-parsing (overview) • Use tools to generate parsers from grammar specifications. • Produces a table that guides a push-down automaton through parsing actions (“shift”, “reduce”) • LL-parsing: Predictive (basically recursive descent parsing in table-driven form) • LR-parsing (incl. SLR- and LALR-parsing): (Virtual) parallel execution of phrase parsers. • Problems: Lookahead bounded in practice, at times unwieldy.

More Related