330 likes | 487 Views
Chap. 5, Top-Down Parsing. J. H. Wang Mar. 29, 2011. Outline. Overview LL(k) Grammars Recursive-Descent LL(1) Parsers Table-Driven LL(1) Parsers Obtaining LL(1) Grammars A Non-LL(1) Language Properties of LL(1) Parsers Parse Table Representation Syntactic Error Recovery and Repair.
E N D
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011
Outline • Overview • LL(k) Grammars • Recursive-Descent LL(1) Parsers • Table-Driven LL(1) Parsers • Obtaining LL(1) Grammars • A Non-LL(1) Language • Properties of LL(1) Parsers • Parse Table Representation • Syntactic Error Recovery and Repair
Overview • Two forms of top-down parsers • Recursive-descent parsers • Table-driven LL parsers: LL(k) – to be explained later • Compiler compilers (or parser generators) • CFG as a language’s definition, parsers can be automatically constructed • Language revision, update, or extension can be easily applied to a new parser • Grammar can be proved unambiguous if parser construction is successful
Top-Down Parsing • Top-down • To grow a parse tree from root to leaves • Predictive • Must predict which production rule to be applied • LL(k) • Scan input left to right, leftmost derivation, k symbol lookahead • Recursive descent • Can be implemented by a set of mutually recursive procedures
LL(k) Grammars • Recall from Chap.2 • A parsing procedure for each nonterminal A • The procedure is responsible for accomplishing one step of derivation for the corresponding production • Choosing production by inspecting the next k tokens. Predict Set for production A is the set of tokens that trigger the production • Predict Set is determined by the right-hand side (RHS)
We need a strategy for choosing productions • Predictk(p): the set of length-k token strings that predict the application of rule p • Input string: a* • S=>*lmAy1…yn • P={pProductionsFor(A)|aPredict(p)} • P: empty set -> syntax error • P: more than one productions -> nondeterminism • P: exactly one production
How to Compute Predict(p) • To predict production p: AX1…Xm, m>=0 • The set of terminal symbols that are first produced in some derivation from X1…Xm • Those terminal symbols that can follow A • (Fig. 5.1)
For LL(1) grammar, the productions for each nonterminal A must have disjoint predict sets • Not all CFGs are LL(1) • More lookahead may be needed: LL(k), k>1 • A more powerful parsing method may be required (Chap. 6) • The grammar may be ambiguous
S MATCH PEEK ADVANCE ERROR
Recursive-Descent LL(1) Parsers • Input: token stream ts • PEEK(): to examine the next input token without advancing the input • ADVANCE(): to advances the input by one token • To construct a recursive-descent parser • We write a separate procedure for each nonterminal A • For each production pi, we check each symbol in the RHS X1…Xm • Terminal symbol: MATCH(ts, Xi) • Nonterminal symbol: call Xi(ts)
PEEK PEEK PEEK
PEEK MATCH PEEK MATCH PEEK MATCH MATCH PEEK PEEK MATCH PEEK PEEK MATCH PEEK
Table-Driven LL(1) Parsers • Creating recursive-descent parsers can be automated, but • Size of parser code • Inefficiency: overhead of method calls and returns • To create table-driven parsers, we use stack to simulate the actions by MATCH() and calls to nonterminals’ procedures • Terminal symbol: MATCH • Nonterminal symbol: table lookup • (Fig. 5.8)
PARSER PUSH MATCH POP PEEK ERROR APPLY APPLY POP PUSH
How to Build LL(1) Parse Table • The table is indexed by the top-of-stack (TOS) symbol and the next input token • Row: nonterminal symbol • Column: next input token • (Fig. 5.9)
Obtaining LL(1) Grammars • It’s easy to violate the requirement of a unique prediction for each combination of nonterminal and lookahead symbols • Common prefixes • Left recursion
Common Prefixes • Two productions for the same nonterminal begin with the same string of grammar symbols • Ex. (Fig. 5.12) Not LL(k) • Factoring transformation • Fig. 5.13 • Ex. (Fig. 5.14)
Left Recursion • A production is left recursive if its LHS symbol is also the first symbol of its RHS • E.g. StmtList StmtList ; Stmt • AA | • (Fig. 5.15 & Fig. 5.16)
A Non-LL(1) Language • Almost all common programming language constructs: LL(1) • One exception: if-then-else (dangling else program) • Can be resolved by mandating that each else is matched to its closest unmatched then • (Fig. 5.17)
Ambiguous (Chap. 6) • E.g. if expr then if expr then other else other • If expr then { if expr then other else other } • If expr then { if expr then other } else other • -> at least two distinct parses • Dangling bracket language (DBL) • DBL={[i]j|i≥j≥0} • if expr then Stmt -> [ (opening bracket) • else Stmt -> ] (optional closing bracket)
Fig. 5.18(a) • S [ S CL | λCL ] | λ • E.g. [[] • Fig. 5.18(b) • S [ S | TT [ T ] | λ
It’s not LL(k) • [Predict( S[S )[Predict( ST )[[Predict2( S[S )[[Predict2( ST )…[kPredictk( S[S )[kPredictk( ST )
Properties of LL(1) Parsers • A correct, leftmost parse is constructed • All grammars in LL(1) are unambiguous • All table-driven LL(1) parsers operate in linear time and space with respect to the length of the parsed input