350 likes | 384 Views
Chapter 4. Syntax Analysis (1). Application of a production A in a derivation step i i+ 1. Formal grammars (1/3). Example : Let G 1 have N = { A , B , C }, T = { a , b , c } and the set of productions A CB BC A aABC bB bb
E N D
Application of a production A in a derivation step i i+1
Formal grammars (1/3) • Example : Let G1have N = {A, B, C}, T = {a, b, c} and the set of productions A CB BC A aABC bB bb A abC bC bc cC cc The reader should convince himself that the word akbkck is in L(G1) for all k 1 and that only these words are in L(G1). That is, L(G1) = { akbkck | k 1}.
Formal grammars (2/3) • Example : Grammar G2is a modification of G1: G2: A CB BC A aABC bB bb A abC bC b The reader may verify that L(G2) = { akbk | k 1}. Note that the last rule, bC b, erases all the C's from the derivation, and that only this production removes the nonterminal C from sentential forms.
Formal grammars (3/3) • Example : A simpler grammar that generates { akbk | k 1} is the grammar G3: G3: S S aSb S ab A derivation of a3b3 is S aSb aaSbb aaabbb The reader may verify that L(G3) = { akbk | k 1}.
Contracting Noncon- tracting Regular The four types of formal grammars
Context-Sensitive Grammars(Type1) Unrestricted Grammars(Type0) • Definition : A context-sensitive grammarG = (N,T,P,) is a formal grammar in which all productions are of the form φAψ→φωψ, ω≠ The grammar may also contain the production →, if G is a context-sensitive (type1) grammar, then L(G) is a context-sensitive (type1) language.
Context-Free Grammars (Type2) • Definition : A context-free grammarG=(N,T,P,) is a formal grammar in which all productions are of the form A→ω The grammar may also contain the production →λ. If G is a context-free (type2) grammar, then L(G) is a context-free (type2) language. A∈N∪{} ω∈(N∪T)*-{λ}
Regular Grammars (Type3) (1/2) • Definition : A production of the form A→aB or A→a is called a right linear production. A production of the form A→Ba or A→a is a left linear production. A formal grammar is right linear if it contains only right linear productions, and is left linear if it contains only left linear production →λ. Left and right linear grammars are also known as regular grammars. If G is a regular (type3) grammar, then L(G) is a regular (type3) language. A∈N∪{∑} B∈N a∈T A∈N∪{∑} B∈N a∈T
Regular Grammars (Type3) (2/2) • Example: A left linear grammar G1 and a right linear grammar G2 have productions as follows: G1 : G2: The reader may verify that L(G1) = (10)*1=1(01)*=L(G2) ∑ → 1B ∑ → 1 A → 1B B → 0A A → 1 ∑ → B1 ∑ → 1 A → B1 B → A0 A → 1
Ambiguity (1/2) • Example : Consider the context-free grammar G: S S SS S ab We see that the derivations correspond to different tree diagrams. The grammar G is ambiguous with respect to the sentence ababab: if the tree diagrams were used as the basis for assigning meaning to the derived string, mistaken interpretation could result.
Ambiguity (2/2) • Definition: A context-free grammar is ambiguous if and only if it generates some sentence by two or more distinct leftmost derivations.
Syntax Error Handling (1/2) • Probable Errors • lexical, such as misspelling an identifier, keyword, or operator • syntactic, such as an arithmetic expression with unbalanced parentheses • semantic, such as an operator applied to an incompatible operand • logical, such as an infinitely recursive call
Syntax Error Handling (2/2) • The error handler in a parser has simple-to-state goals: • It should report the presence of errors clearly and accurately. • It should recover from each error quickly enough to be able to detect subsequent errors. • It should not significantly slow down the processing of correct programs.
Error-Recovery Strategies • panic mode • phrase level • error productions • global correction
Example 4.2 • The grammar with the following productions defines simple arithmetic expressions.
Notational Conventions (1/2) 1. These symbols are terminals: i) Lower-case letters early in the alphabet such as a, b, c. ii) Operator symbols such as +, -, etc. iii) Punctuation symbols such as parentheses, comma, etc. iv) The digits 0, 1, . . . , 9. v) Boldface strings such as id or if. 2. These symbols are nonterminals: i) Upper-case letters early in the alphabet such as A, B, C. ii) The letter S, which, when it appears, is usually the start symbol. iii) Lower-case italic names such as expr or stmt. 3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals.
Notational Conventions (2/2) 4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z, represent strings of terminals. 5. Lower-case Greek letters, , , , for example, represent strings of grammar symbols. Thus, a generic production could be written as A , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols to the right of the arrow (the right side of the production). 6. If A 1, A 2, . . . , A k are all productions with A on the left (we call them A-productions), we may write A 1| 2 | . . . | k . We call 1, 2, . . . , k the alternatives for A. 7. Unless otherwise stated, the left side of the first production is the start symbol.
Derivations • We say that A if A is a production and and are arbitrary strings of grammar symbols. If 1 2 . . . n, we say 1derives n. The symbol means “derives in one step”. Often we wish to say “derives in zero or more steps”. For this purpose we can use the symbol . Thus, 1. for any string , and 2. If and , then . * * * *
Fig. 4.3. Building the parse tree from derivation (4.4) (Grammar 4.4 ) E -E -(E) -(E+E) -(id+E) -(id+id)
Elimination of Left Recursion • No matter how many A-productions there are, we can eliminate immediate left recursion from them by the following technique. First, we group the A-productions as A A1 | A2 | . . . | Am|1 | 2 | . . . |n where no begins with an A. Then, we replace the A-productions by A 1A' |2A' | . . . |nA' A' 1A' |2A' | . . . |mA' |
Left Factoring • In general, if A 1 |2 are two A-productions, and the input begins with a nonempty string derived from , we do not know whether to expand A to 1 or to 2 . However, we may defer the decision by expanding A to A'. Then, after seeing the input derived from , we expand A' to 1 or to 2 . That is, left-factored, original productions become A A' A' 1 |2 • Example 4.12. The language L2 = { anbmcndm | n 1 and m 1 }
Fig. 4.9. Steps in top-down parse. (a) (b) (c)
Fig. 4.10. Transition diagrams for grammar (4.11). (Grammar 4.11 )
(a) (b) (c) (d) Fig. 4.11. Simplified transition diagrams.
Fig. 4.12. Simplified transition diagrams for arithmetic expressions.
Nonrecursive Predictive Parsing 1. If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol. 3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.
Fig. 4.16. Moves made by predictive parser on input id + id * id.
Fig. 4.17. Parsing table M for grammar (4.13). (Grammar 4.13 )
Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.
Fig. 4.19. Parsing and error recovery moves made by predictive parser.