220 likes | 400 Views
Parsing. Goals of Parsing. Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse, a complete parse tree Parse tree (or trace) is basis for translation. Top-down Parsers.
E N D
Goals of Parsing • Check the input for syntactic accuracy • Return appropriate error messages • Recover if possible • Produce, or at least traverse, a complete parse tree • Parse tree (or trace) is basis for translation
Top-down Parsers • Parse tree is built from the root down to the leaves • Builds parse tree in preorder • Corresponds to a leftmost derivation • Parsing decision problem: choosing correct rule • Two most common algorithms: • Recursive Descent – implemented in code • Table driven implementation • Both are LL algorithms (left-to-right scan, left-most derivation)
Bottom-up • Parse tree is built from the leaves up to the root • Builds parse in reverse of a rightmost derivation • Requires finding a handle, that is, a correct RHS • Most common algorithms are LR (left-to-right, rightmost derivation)
Complexity • The most general parsing algorithms work for any unambiguous grammar • Complicated, inefficient • O(n^3) • Trade generality for efficiency • Commercial compilers have complexity O(n)
Recursive Descent • Parser is made up of a collection of subprograms • One for each non-terminal • Subprogram responsible for generating the parse tree rooted at the given non-terminal • Pulls tokens from the tokenizer, and leaves the first token not a part of its rule in nextToken • If multiple rules associated with the current non-terminal, first a determination of the correct rule must be made
Function Factor //<factor> -> id | (<expr>) void factor() { if (nextToken == ID_CODE) lex(); else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } else error(); /* Neither RHS matches */ }
<ifstmt> ::= if ( <boolexpr> ) <stmt> [else <stmt>] void ifstmt() { if (nextToken != IF_CODE) error(); else { lex(); if (nextToken != LEFT_PAREN) error(); else { lex(); boolexpr(); if (nextToken != RIGHT_PAREN) error(); else { lex(); statement(); if (nextToken == ELSE_CODE) { lex(); statement(); } } } } }
Grammar Restrictions • Left-recursion is a problem • A ::= A + B • Parsing would never terminate! • In some cases, left-recursion can be eliminated by refactoring the grammar • E ::= E + T | T • E ::= T E’ • E’ ::= + T E’ | ε
Grammar restrictions continued • Ability to choose correct production based on a single next token • Pairwise disjointedness test indicates whether or not this choice can be accomplished • If the first terminal that can be generated from a rule is unique A ::= aB | bAb | Bb B ::= cB | d A ::= aB | Bab B ::= aB | b FIRST Sets {a} {b} {c, d} Disjoint, Recursive descent parsable FIRST Sets {a} {a,b} Not disjoint, not recursive descent parsable
Table driven parsers • Encode production choice in a table • Rows indicate current top of the stack • Columns for each input token • Entry in matrix gives production number • Preferred for large grammars • Algorithm is fixed • Only table size grows
Expression Grammar Example S ::= A $ A ::= i = E; E ::= T E’ E’ ::= | AO T E’ AO ::= + | - T ::= F T’ T’ ::=MO F T’ MO ::= * | / F ::= F’ P F’ ::= | UO UO ::= - | ! P ::= i | l | ( E )
Bottom-up Parsing • Often called shift-reduce algorithms • Integral piece of every bottom-up parser is a stack • Shift moves the next input token onto the stack • Reduce replaces a RHS on the top of the stack with the corresponding LHS • Most bottom-up parsing algorithms are variations of the LR process • Originally designed by Donald Knuth • Relatively small program and a parsing table
Advantages of LR Parsers • Will work for nearly all grammars that describe programming languages. • Work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. • Can detect syntax errors as soon as it is possible. • LR class of grammars is a superset of the class parsable by LL parsers
Disadvantage • For anything but very small grammars, it is difficult to produce by hand the parsing table • But this is exactly what tools like yacc and bison can do for us automatically! • Original version was computationally intensive (both in terms of time and memory) • Variations developed: • Less computer resources required • Not as general
Key Insight • A bottom-up parser can use the entire history of the parse, up to the current point, to make parsing decisions • There are only a finite and relatively small number of different parse situations that could have occurred, so the history can be stored in a parser state, on the parse stack
Parser Configuration • Made up of both the stack, and the input • For each state on the stack, there is an associated grammar symbol • E.g. (S0X1S1X2S2…XmSm, aiai+1…an$) where Si indicates a state, and Xi indicates a grammar symbol • Initial configuration: (S0, a0…an$)
Table driven bottom up parsing • Table has two components: • ACTION table • Specifies the action of the parser, given the parser state and the next token • Rows are state names • Columns are terminals • GOTO table • Specifies state to put in the stack after a reduce operation • Rows are state names • Columns are non-terminals
Parser actions • If ACTION[Sm, ai] = Shift S, the next configuration is: (S0X1S1X2S2…XmSmaiS, ai+1…an$) • If ACTION[Sm, ai] = Reduce A and S = GOTO[Sm-r, A], where r = the length of , the next configuration is (S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$) • If ACTION[Sm, ai] = Accept, the parse is complete and no errors were found. • If ACTION[Sm, ai] = Error, the parser calls an error-handling routine.
Example LR Parsing Table • 1. E ::= E + T • 2. E ::= T • 3. T ::= T * F • 4. T ::= F • 5. F ::= ( E ) • 6. F ::= id