520 likes | 638 Views
Chapter4 Top –down Syntax Analysis. Zhang Jing, Yu SiLiang College of Computer Science & Technology Harbin Engineering University.
E N D
Chapter4 Top –down Syntax Analysis Zhang Jing, Yu SiLiang College of Computer Science & Technology Harbin Engineering University
Syntax analysis is a very important part in compiler and its position in compiler is shown by Figure 4.1. Its task is to analyze grammar structure and judge if a string can be recognized by a grammar. There are two types syntax analysis; one is top-down syntax analysis, the other is bottom-up syntax analysis. In this chapter, we introduce top-down syntax analysis; bottom-up syntax analysis will be described in Chapter 5. Top-down syntax analysis can be viewed as an attempt to find a leftmost derivation for an input string. . zhangjing@hrbeu.edu.cn
Equivalently, it can be viewed as an attempt to construct a parsing tree for the input starting from the root and creating the nodes of the parsing tree in preorder. There are two problems that we should address before we consider top-down syntax analysis. The first problem is to eliminate left recursive rules and deduce, and the second one is to avoid backtracking. The second one is to discuss two syntax analysis techniques, one is recursive-descent parsing, the other is LL(1) method. . zhangjing@hrbeu.edu.cn
Practical limitation of grammar • Usually, in compiling a program, there are some limitations. That is, there is no such rules: U::=ε, U ⇒+U , and so on. In this section, we will introduce some methods to obtain a new grammar from an old one. . • (1) There is no rule, U::= U, because this rule has no meaning and easily leads to ambiguous grammar. For example, S::=0S1|01|S should be replaced by S::=0S1|01. . zhangjing@hrbeu.edu.cn
(2) There are no useless rules. That is, Rules in grammar should be limited as follows. . (a) U must appear in a rule, such as Z ⇒ *xUy . (b) we can deduce a terminal from U, U ⇒+t(t∈VT+). . zhangjing@hrbeu.edu.cn
For example, S::= AB | CA A::= a B::= CB | AB C::= cB | b D::= aD | d D::= aD|d is useless rule in this grammar, because D doesn’t exist in other rules, that is, non-reachable symbols; in addition, we can not deduce any rules from it. B::= CB|AB is useless rule too, because, we can not deduce any terminals from it. zhangjing@hrbeu.edu.cn
(3) There are no direct left recursive rules U::=U…, such as E::=E+T. . Grammar G[U]: U::=Ux|y Should be change as, U::=y{x} Or U::=yU’ U’::=xU’|ε If there is a grammar as, U::=Ux1|Ux2|…|Uxm|y1|…|yn The equal grammar that doesn’t include left recursive is, U::=y1U’|y2U’|…|ynU’ U’::=x1U’|x2U’|…|xmU’|ε zhangjing@hrbeu.edu.cn
Example 4.1 S ::=S A | A A ::=S B | B | ( S ) | ( ) B ::= [ S ] | [ ] Change to: S ::=S S B | S B | S ( S ) | S ( ) S ::=S B | B | ( S ) | ( ) B ::=[ S ] | [ ] Change to: S ::=B S’ | ( S ) S’ | ( ) S’ S’ ::=S B S’ | B S’ | ( S ) S’ | ( ) S’ | ε B ::= [ S ] | [ ] zhangjing@hrbeu.edu.cn
(4) There is no rule, U::=ε In order to eliminate U::=ε, for example U::=yU’ U’::=xU’|ε We can write it as, U::=yU’|y U’::=xU’|x • (5) There is no indirect recursion U ⇒+U…,such as T ::=E+T, E ::=T. In order to eliminate the indirect recursion, we replace the non-terminal from start rule to end rule. zhangjing@hrbeu.edu.cn
Example 4.2 G〔S〕: S::=Aa|b A::=Ac|Sd|e Because S::=Aa|b, the second rule A::=Sd could be replaced by it, the grammar can be reconstructed as. . S::=Aa|b A::=Ac|Aad|bd|e zhangjing@hrbeu.edu.cn
While there is left recursive in the second rule and it can be rewritten as S::=Aa|b A::=bdA’|eA’ A’::=cA’|adA’|ε • Rewrite the grammar again to eliminate the stringεin last rule . S::=Aa|b A::=bdA’|eA’|bd|e A’::=cA’|adA’|c|ad zhangjing@hrbeu.edu.cn
Algorithm for eliminating indirect recursion. zhangjing@hrbeu.edu.cn
FIRST (X) and FOLLOW(U) • 1. FIRST(x) The definition of FIRST (X), FIRST(X)={a|X ⇒ *a… , a∈VT} Especially, when there is X ⇒ *ε,then ε∈FIRST(X). zhangjing@hrbeu.edu.cn
Example 4.3 G〔E〕: E::=E+F|T T::=T*F|F F::=(E)|i While FIRST( E)={(,i} FIRST(T)={(, i} FIRST(F)={(, i} zhangjing@hrbeu.edu.cn
2. FOLLOW(U) FOLLOW(U)={a|Z ⇒ *…Ua… , a∈VT} While Z ⇒ *…U,then #∈FOLLOW(U). • Example 4.4 G〔E〕: E::=TE’ E’::=+TE’|ε T::=FT’ T’::=*FT’|ε F::=(E)| i FOLLOW(T)? zhangjing@hrbeu.edu.cn
We know there is a deduction from start symbol E, E ⇒TE’ ⇒T+TE’ E ⇒ TE’ ⇒ T E ⇒ TE’ ⇒ FT’E’ ⇒(E)T’E’ ⇒(TE’)T’E’ ⇒(T)T’E’ So the FOLLOW(T)={+,#,)}. zhangjing@hrbeu.edu.cn
The definition of FOLLOW(U): (1) If Z is start symbol,then #∈FOLLOW(Z). (2) If there is rule: A::=αUβ, then FIRST(β)that does not includeεbelongs to FOLLOW(U). (3) If there are rules: A::=αU or A::=αUβ(whileεis in FIRST(β)), then FOLLOW(A)belongs to FOLLOW(U). zhangjing@hrbeu.edu.cn
For example 4.4, we will use the FOLLOW definition above to obtain the FOLLOW(E), FOLLOW(T), FOLLOW(F), FOLLOW(E’), FOLLOW(T’). • FOLLOW(E) Because E is start symbol, “# ”∈FOLLOW(E) There is a rule of F::=(E), so “)”∈FOLLOW(E), namely, FOLLOW(E)={#,)} zhangjing@hrbeu.edu.cn
FOLLOW(T) There is rule: E::=TE’ “+”in FIRST(E’)belongs to FOLLOW(T) In addition, there isεin FIRST(E’), FOLLOW(E)∈FOLLOW(T), FOLLOW(E)={#,)} So, FOLLOW(T)={+, #, )} zhangjing@hrbeu.edu.cn
FOLLOW(F) There is rule: T::=FT’ “*”is in FIRST(T’),so “*”∈FOLLOW(F), what is more, there is εin FIRST(T’), FOLLOW(T)={+,#,)}∈FOLLOW(F), So, FOLLOW(F)={*,+,#,)} • FOLLOW(E’)and FOLLOW(T’) Similarly, FOLLOW(E’)={#,)} FOLLOW(T’)={+,#,)} zhangjing@hrbeu.edu.cn
Example 4.5 Grammar G[S]: S::=a A B | b A | ε A::=B S b | ε B::=b B | ε The FIRST set of it are: FIRST(B) = { b, ε } FIRST(A) = { a, b, ε } FIRST(S) = { a, b, ε } zhangjing@hrbeu.edu.cn
The FOLLOW set of it are: FOLLOW(S) are from FIRST (b), so FOLLOW(S)= { #, b } FOLLOW(B) are from FOLLOW (S),FIRST (Sb), so FOLLOW(B)={ #, b, a } FOLLOW(A) are from FOLLOW(S),FIRST(B), so FOLLOW(A) ={ #, b } zhangjing@hrbeu.edu.cn
Avoiding backtracking • Example 4.6 S::=xAy A::=ab|a Judge if the string “xay” can be identified by the grammar. zhangjing@hrbeu.edu.cn
The first deduction of it is • The second deduction of it is zhangjing@hrbeu.edu.cn
The steps of deduction above is that it goes from first deduction, then when it goes wrong, return to the top level and deduce from it with the second deduction. We call the return deduction backtracking. . • From the deduction result, we know that we have to try backtracking for several times when we want to judge a string if it can be identified by a grammar. . zhangjing@hrbeu.edu.cn
Is there a way avoiding backtracking? The answer is yes. The limitations can avoid backtracking: : (1) There are rules U::=x1|x2|…|xn , while there is no xj ⇒*ε . FIRST(xi)∩FIRST(xj)=Φ (i≠j) (2) If xj ⇒*ε then FIRST(xi) ∩FOLLOW(U)= Φ zhangjing@hrbeu.edu.cn
Example 4.7 G(S): S::=xAy A::=ab|a FIRST(ab)= FIRST(a)={a},so there is backtracking in this grammar . . In order to eliminate backtracking, we can rewrite the grammar like this , , S::=xAy A::=aB B::=b|ε zhangjing@hrbeu.edu.cn
Example 4.8 G〔S〕: S::=AB A::=Aa|bB B::=a|Sb zhangjing@hrbeu.edu.cn
Firstly, we should eliminate the left recursive and rewrite the grammar like this, S::=AB A::=bBA’ A’::=aA’|ε B::=a|Sb zhangjing@hrbeu.edu.cn
Secondly, to judge if there is backtracking. For the rule of A’::=aA’|ε There are FIRST(aA’)={a} And FOLLOW(A’)={a,b} FIRST(aA’)∩ FOLLOW(A’)={a} , it is not Φ,so there is backtracking in the grammar. zhangjing@hrbeu.edu.cn
Example 4.9 G[E] is a grammar of calculation expression: E::=TE’ E’::=+TE’|ε T::=FT’ T’::=*FT’|ε F::=(E)| i zhangjing@hrbeu.edu.cn
We use three steps to judge if there is backtracking. . (1) For the second rule E’::=+TE’|ε, because there isεin the rule, we should see if it suits to the second limitation of avoid backtracking, that is to judge if FIRST(xi) ∩FOLLOW(U)= Φ FIRST(+TE’)={+}, and FOLLOW(E’)={),#} FIRST(+TE’)∩FOLLOW(E’)=Φ The result is there is no backtracking in this rule. zhangjing@hrbeu.edu.cn
(2) For the fourth rule T’::=*FT’|ε, there are FIRST(*FT’)={*} And FOLLOW(T’)={+,),#} There is FIRST(*FT’)∩ FOLLOW(T’)=Φ, The result is that there is no backtracking in the rule. . • (3) For the last rule F::=(E)|i FIRST((E))={(}and FIRST(i)={i}, there is no same element in the two sets, the result is there is no backtracking in the rule. To sum up, there is no backtracking in the calculation expression grammar . . zhangjing@hrbeu.edu.cn
Top-down parsing • The methods of top-down parsing consist of recursive-descent parsing and LL(1) method. zhangjing@hrbeu.edu.cn
Recursive-descent parsing • Recursive-descent parsing can be viewed as a parsing tree, the root of parsing tree is the start symbol, the nodes in it are ordered by rules. In this section, we will consider a general form of recursive-descent parsing. The key point during parsing is determining which branch should be applied for a non-terminal. . zhangjing@hrbeu.edu.cn
The steps of recursive-descent parse are: (1)A non-terminal U can be written as a sub-program P(U); ; (2)The address of a input recursive sub-program can be put in a address stack SCIN , , k: =k+1; S〔k〕: = return address (3)The output address of recursive sub-program is SCOUT, k:=k-1; GOTO S〔k+1〕 (4)For rules like U::=x1|x2|…|xn, its sub-program is P(U). zhangjing@hrbeu.edu.cn
The algorithm of recursive-descent parse is as follows. zhangjing@hrbeu.edu.cn
The program of recursive-descent parse in example 4.9 is as follows. zhangjing@hrbeu.edu.cn
LL(1) method • LL(1) is a kind of top-down parsing analysis method that there is no backtracking in grammar. The first“L” in LL(1) stands for scanning the input from left to right. The second “L” for producing a leftmost derivation, and the “1” for using one input symbol of look ahead at each step to decide parsing. There are two distinctive limitations in LL(1), one is that there is no ambiguity in the grammar, and the other is that there is no left recursive in the grammar. . zhangjing@hrbeu.edu.cn
1.Parsing table M We should construct a parsing table M before doing the LL(1) parsing. The left row is non-terminal symbols that are in order beginning from the start symbol, and the top line is input terminal symbols. . There are two steps to construct a parsing table. (1) If a terminal symbol “a” belong to FIRST(x), then M〔U , a〕=“U::=x”; (2) Ifε∈FIRST(x) and terminal {b,#}∈FOLLOW(U), then there are M〔U , b〕 =“U::=x”and M〔U , #〕= “U::=x”; zhangjing@hrbeu.edu.cn
The following is parsing table of example 4.9. zhangjing@hrbeu.edu.cn
The steps of constructing parsing table of example 4.9 is as follows: G〔E〕: E::=TE’ E’::=+TE’|ε T::=FT’ T’::= *FT’|ε F::=(E)| i zhangjing@hrbeu.edu.cn
(1) For the first rule :E::=TE’ , FIRST(TE’)={(, i}, there are M〔E , (〕=“E::=TE’ ”and M〔E , i〕=“E::=TE’ ”. (2) For the second rule E’::=+TE’|ε, FIRST(+TE’)={+}, so there is M〔E’ , +〕=“E::=+TE’ ”;In addition, for rule of E::=ε, FOLLOW(E’)={), #}, the M〔E’,)〕=“E’::=ε”and M〔E’, #〕=“E’::=ε”are in the parsing table. zhangjing@hrbeu.edu.cn
(3) For rule of T’::=*FT’|ε, FIRST(*FT’)={*}and FOLLOW(T’)={+, ), #}, So there are M〔T’, *〕=“T’::=*FT’ ”; M〔T’ , +〕=“T’::=ε”;M〔T’)〕=“T’::=ε”and M〔T’, #〕=“T’::=ε”. (4) Similarly, for the last rule F::=(E)|i, there are M〔F, (〕=“F::=(E)”and M〔F, i〕=“F::=i”. zhangjing@hrbeu.edu.cn
2. LL(1) • The method of LL(1) is that it scans from the start symbol and deduce from left most symbol. For M〔U , a〕=“U::=x”, we can use the rules right x to replace the rules left U when the input character is a. We know that there is “first in last out” in stack, so characters in stack have the opposite order that it is in rules. When character on top of stack is equal to input character, both of them are popped. When all characters in stack are disappeared, stop and it means the string can be recognized by the grammar. Table 4.2 is identifying the string “i + i*i “in example 4.9 by LL(1) method. . zhangjing@hrbeu.edu.cn
Example 4.10 There is grammar G[S]: S ::= if E then S else S | if E then S| other E ::= b Before parsing the grammar, we should first reconstruct the grammar, that is, remove the left recursive grammar and avoid backtracking. Second, judge if grammar G[S] is grammar of LL(1). Finally, if it is LL(1) grammar, parse it. zhangjing@hrbeu.edu.cn
(1) After reconstructing, the grammar become: S ::= if E then SS’|other S’ ::=else S| E ::=b (2) Obtain the set of FIRST and FOLLOW: FIRST (S)={if, other}, FIRST (S’)={else, ε} FIRST (E)={b} FOLLOW (S)= FOLLOW (S’)={else,#} FOLLOW (E)={then} zhangjing@hrbeu.edu.cn
(3)Judge if the grammar is LL(1) grammar. FIRST (if E then S S’) ∩ FIRST (other)= Φ FIRST (else S) ∩FIRST (ε)= Φ FIRST (else S) ∩FOLLOW (S’)={else} (4)The parsing table of it is shown by Table 4.3 zhangjing@hrbeu.edu.cn