1 / 19

Parsers and Grammar

Parsers and Grammar. Categories of Grammar Rules. Declarations or definitions . AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype [ = expression ] { , datatype [ = expression ] } ; access ::= ' public ' | ' protected ' | ' private ' Statements.

Download Presentation

Parsers and Grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsers and Grammar

  2. Categories of Grammar Rules • Declarationsordefinitions. AttributeDeclaration::= [ final ] [ static ][ access ] datatype [= expression ]{ , datatype [= expression ] } ; access::= ' public ' | ' protected ' | ' private ' • Statements. • assignment, if, for, while, do_while • Expressions, such as the examples in these slides. • Structuressuch as statement blocks, methods, and entire classes. StatementBlock ::='{' { Statement; }'}'

  3. Parsing Algorithms (1) • Broadly divided into LL and LR. • LL algorithms match input directly to left-side symbols, then choose a right-side production that matches the tokens. This is top-down parsing • LR algorithms try to match tokens to the right-side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol • LALR (look-ahead LR) are a special case of LR; they require a few restrictions to the LR case • Reference: Sebesta, section 4.3 - 4.5.

  4. Parsing Algorithms (2) • Look ahead: • algorithms must look at next token(s) to decide between alternate productions for current tokens • LALR(1) means LALR with 1 token look-ahead • LL(1) means LL with 1 token look-ahead • LL algorithms are simpler and easier to visualize. • LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion. • yacc, bison, and CUP generate LALR(1) parsers • Recursive-descent is a useful LL algorithm that "every computer professional should know" [Louden].

  5. Top-down (LL) Parsing Example For the input: z = (2*x + 5)*y - 7; tokens:ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; Grammar rules (as before): assignment => ID = expression ; expression => expression + term | expression - term | term term => term * factor | term / factor | factor factor => ( expression ) | ID | NUMBER

  6. Top-down Parsing Example (2) The top-down parser tries to match input to left sides. ID = ( NUMBER * ID + NUMBER )* ID - NUMBER ; assignment ID = expression ID =expression-term; ID =term -term; ID =term * factor -term; ID =factor * factor -term; ID = ( expression * factor -term; ID = ( expression +term ) * factor -term; ID = ( term +term ) * factor -term; ID = ( term * factor +term )* factor -term; ID = ( factor * ID+factor )* factor-term; ID = (NUMBER * ID + NUMBER)*factor-term; ID = (NUMBER * ID + NUMBER)* ID-factor; ID = (NUMBER * ID + NUMBER)* ID - ID ;

  7. Top-down Parsing Example (3) • Problem in example:we had to look ahead many tokens in order to know which production to use. • This isn't necessary provided that we know the grammar is parsable using LL (top-down) methods. • There are conditions on the grammar that we can test to verify this. (see: The Parsing Problem) • Later we will study the recursive-descent algorithm which does top-down parsing with minimal look-ahead.

  8. Bottoms-up (LR) Parsing Example (1) tokens: ID = ( NUMBER * ID + NUMBER ) * ID - NUMBER ; parser: ID ... read (shift) first token factor ... reduce factor = ... shift FAIL: Can't match any rules (reduce) Backtrack and try again ID = ( NUMBER ... shift ID = ( factor ... reduce ID = ( term* ... sh/reduce ID = ( term * ID ... shift ID = ( term * factor ... reduce ID = ( term ... reduce ID = ( term+ ... shift ID = ( expression + NUMBER ... reduce/sh ID = ( expression + factor ... reduce ID = ( expression + term ... reduce Action

  9. Bottoms-up Parsing Example (2) tokens: ID = ( NUMBER * ID + NUMBER ) * ID -NUMBER; input: ID = ( expression ... reduce ID = ( expression) ... shift ID = factor ... reduce ID = factor* ... shift ID = term * ID ... reduce/sh ID = term * factor ... reduce ID = term ... reduce ID = term - ... shift ID = expression - ... reduce ID = expression - NUMBER ... shift ID = expression - factor ... reduce ID = expression -term ... reduce ID = expression ; shift assignment reduce SUCCESS!! Start Symbol

  10. Bottoms-up Parsing Example (3) • LR parsing processes the input stream from the Left and tries to match the input to the Right side of a production. • When something matches, it reduces the expression to a left side non-terminal symbol. • Repeat the process until the entire input stream is matched. • This could potentially be an O(n3) task, but Knuth and others devised a table-based algorithm that is O(n).

  11. The Parsing Problem

  12. The Parsing Problem • Top-down parsers must decide which production to use based on the current symbol, and perhaps "peeking" at the next symbol (or two...). • Predictive parser: a parser that bases its actions on the next available token (called single symbol look-ahead). • Two conditions are necessary: [see Louden, p. 108-110]

  13. The Parsing Problem (cont.) Condition 1: the ability to choose between multiple alternatives, such as:A1 | 2|... | n • define First() = set of all tokens that can be the first token for any production cascade that produces symbol  • then a predictive parser can be used for rule A if: First(1)  First(2)...  First(n) is empty. Condition 2: the ability of the parser to detect presence of an optional element, such as A[b]. • Can the parser detect for certain when bis present?

  14. The Parsing Problem (cont.) Example: listexpr[list]. How do we know that list isn't part of expr? • define Follow(  ) = set of all tokens that can follow the non-terminal  some production. Use a special symbol ($) to represent the end of input if  can be the end of input. • Example: Follow( factor ) = { +, -, *, /, ), $ } while Follow( term ) = { *, /, ), $ } • then a predictive parser can detect the presence of optional symbol b if First(b)  Follow(b) is empty.

  15. Review and Thought Questions

  16. Lexics vs. Syntax vs. Semantics • Division between lexical and syntactic structure is not fixed: • number can be a token or defined by a grammar rule. • Implementation can often decide: • scanners are faster • parsers are more flexible • error checking of number format as regex is simpler • Division between syntax and semantics is not fixed: • we could define separate rules for IntegerNumber and FloatingPtNumber , IntegerTerm, FloatingPtTerm, ... in order to specify which mixed-mode operations are allowed. • or specify as part of semantics

  17. Numbers: Scan or Parse? We can construct numbers from digits using the scanner or parser. Which is easier / better ? • Scanner: Define numbers as tokens: number: [-]\d+ • Parser: grammar rules define numbers (digits are tokens): number=>'-' unsignednumber | unsignednumber unsignednumber => unsignednumber digit|digit digit=>0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |9

  18. Is Java 'Class' grammar context-free? • A class may have static and instance attributes. • An inner class or local class have same syntax as top-level class, but: • may not contain static members (except static constants) • inner class may access outer class using OuterClass.this • local class cannot be "public" • Does this means the syntax for a class depends on context?

  19. Alternative operator notation • Some languages use prefix notation: operator comes first expr=>+exprexpr|*exprexpr | NUMBER • Examples: *+ 2 3 4 means (2 + 3) * 4 + 2 * 3 4 means 2 + (3 * 4) • Using prefix notation, we don't have to worry about precedence of different operators in BNF rules !

More Related