430 likes | 688 Views
컴파일러 입문. 제 7 장 LL 구문 분석. 목 차. I. 결정적 구문 분석 II. Recursive-descent 파서 III. Predictive 파서 VI. Predictive 파싱 테이블 의 구성 V. Strong LL(k) 문법과 LL(k) 문법. I. 결정적 구문 분석. ▶ Deterministic Top-Down Parsing ::= deterministic selection of production rules to be applied
E N D
컴파일러 입문 Compiler Lecture Note, LL Parsing 제 7 장 LL 구문 분석
Compiler Lecture Note, LL Parsing 목 차 I.결정적 구문 분석 II.Recursive-descent파서 III.Predictive파서 VI.Predictive파싱 테이블의 구성 V.StrongLL(k)문법과 LL(k)문법
Compiler Lecture Note, LL Parsing I. 결정적 구문 분석 ▶ Deterministic Top-Down Parsing ::= deterministic selectionof production rules to be applied in top-down syntax analysis. ▶ One passnobackup 1. Input string is scannedonce from left to right. 2. Parsing process is deterministic. ▶ Top-down parsing with nobackup ::= deterministic top-down parsing. called LL parsing. "Left to right scanning and Leftparse"
Compiler Lecture Note, LL Parsing ▶ How to decide which production is to be applied: sentential form : 1 2 … i-1Xα input string : 1 2 … i-1 ii+1 …n X 1 | 2... | k ∈ P일 때, i를 보고 X-production 중에unique하게 결정. the condition forno backtracking: FIRST와 FOLLOW가 필요. (= LL condition)
Compiler Lecture Note, LL Parsing FIRST ▶ Computation of FIRST(X), where X ∈ V. 1) if X∈VT, then FIRST(X) = {X} 2) if X∈VN and X a∈P, then FIRST(X) = FIRST(X) {a} if X ∈ P, then FIRST(X) = FIRST(X) {} 3) if X Y1Y2 …Yk ∈ P and Y1Y2 …Yi-1*, i then FIRST(X) = FIRST(X) ( FIRST(Yj) - {}). j=1 if Y1Y2 …Yk* , then FIRST(X) = FIRST(X) {}. ▶ FIRST() ::= the set of terminals that begin the strings derived from . if * , then is also in FIRST(). FIRST(A) ::= { a∈VT∪{} | A * a, ∈ V* }.
Text p.268 Compiler Lecture Note, LL Parsing ex1) E TE E+TE | T FT T FT | F (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E) = {+, } FIRST(T) = {, } ex2) PROGRAM begin d semi X end X d semi X X s Y Y semi s Y | FIRST(PROGRAM) = {begin} FIRST(X) = {d,s} FIRST(Y) = {semi, }
Compiler Lecture Note, LL Parsing 연습문제 7.4 (1) - p.299 • FIRST를 구하시오. (1) S aRTb | bRR R cRd | T RS | TaT
Compiler Lecture Note, LL Parsing ▶ left-dependency graph - the vertices are the terminal and nonterminal symbols and the arcs go from X to Y if and only if X X1...XnY, where n 0, and each of X1,...,Xn can produce the empty string. ex) S AB A aA | B bB | A a S B b FIRST(S) = {a, , b} FIRST(A) = {a, } FIRST(B) = {b, }
Compiler Lecture Note, LL Parsing ★ In general, A A1A2...An if A1 : non-nullable if A1 : nullable if A1A2 : nullable A A1 A1 A A2 A1 A A2 A3
Compiler Lecture Note, LL Parsing FOLLOW ▶ FOLLOW(A) ::= the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A). $ is the input right marker. ::= {a ∈ VT∪{$} | S *Aa, , ∈ V*}. ▶ Computation of FOLLOW(A) 1) FOLLOW(S) = {$} 2) if A B ∈ P and , then FOLLOW(B) = FOLLOW(B) ∪ (FIRST() -) 3) if A B ∈ P or A B and *, then FOLLOW(B) = FOLLOW(B) ∪ FOLLOW(A).
Text p.271 Compiler Lecture Note, LL Parsing ex) E TE' E' +TE' | T FT' T' FT' | F (E) | id Nullable = { E, T } FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E) = {+, } FIRST(T) = {, } FOLLOW(E) = {),$} FOLLOW(E') = {),$} FOLLOW(T) = {+,),$} FOLLOW(T') = {+,),$} FOLLOW(F) = {,+,),$}
Compiler Lecture Note, LL Parsing 연습문제 7.4 (3) - p.299 • FOLLOW를 구하시오. (3) S aAa | A abS | c
Compiler Lecture Note, LL Parsing ▶ LL condition ::= no backup condition ::= the condition for deterministic parsing of top-down method. input : 12 ... i-1i ...n derived string : 12...i-1X X 1 | 2 ... | m i를 보고 X-production들 중에서 X를 확장할 rule을 결정적으로 선택. ★ <LL condition> A | ∈ P, 1. FIRST() FIRST() = 2. if * , FOLLOW(A) FIRST() =
Compiler Lecture Note, LL Parsing ex) A aBc | Bc | dAa B bB | FIRST(A) = {a,b,c,d} FOLLOW(A) = {$,a} FIRST(B) = {b, } FOLLOW(B) = {c} 1) A aBc | Bc | dAa에서, FIRST(aBc) FIRST(Bc) FIRST(dAa) = {a} {b,c} {d} = 2) B bB | 에서, FIRST(bB) FOLLOW(B) = {b} {c} = 1), 2)에 의해 LL 조건을 만족한다.
Compiler Lecture Note, LL Parsing II. Recursive-descent 파서 ▶ Recursive-descent parsing ::= A top-down method that uses a set of recursiveprocedures to recognize its input with no backtracking. ▶ create a procedure for each nonterminal. ex) G : S aA | bB A aA | c B bB | d procedure pS; begin if nextsymbol = qa then begin get_nextsymbol; pAend else if nextsymbol = qb then begin get_nextsymbol; pB end else error end;
= aac$ Compiler Lecture Note, LL Parsing procedure pA; begin if nextsymbol = qa then begin get_nextsymbol; pA end else if nextsymbol = qc then get_nextsymbol else error end; procedure pB; ... (* main *) begin get_nextsymbol; pS; if next_symbol = '$' then accept else error end. Procedure call sequence ::= leftmost derivation
Compiler Lecture Note, LL Parsing ▶ The main problem in constructing a recursive-descent syntax analyzer is the choice of productions when a procedure is first entered. To resolve this problem, we can compute the lookahead of each production. ▶ LOOKAHEADof a production Definition: LOOKAHEAD(A) = FIRST({ | S *A*∈ VT*}). Meaning : the set of terminals which can be generated by and if *, then FOLLOW(A) is added to the set. Computing formula: LOOKAHEAD(A X1X2...Xn) = FIRST(X1X2...Xn) FOLLOW(A)
Compiler Lecture Note, LL Parsing ex) S aSA | A c Nullable Set = {S} FIRST(S) = {a, } FOLLOW(S) = {$,c} FIRST(A) = {c} FOLLOW(A) = {$,c} LOOKAHEAD(S aSA) = FIRST(aSA) FOLLOW(S) = {a} LOOKAHEAD(S ) = FIRST() FOLLOW(S) = {$,c} LOOKAHEAD(A c) = FIRST(c) FOLLOW(A) = {c} LOOKAHEAD를 구하는 순서 : Nullable => FIRST => FOLLOW => LOOKAHEAD
Compiler Lecture Note, LL Parsing ▶ Strong LL condition Definition : A | ∈ P, LOOKAHEAD(A ) LOOKAHEAD(A ) = . Meaning : for each distinct pair of productions with the same left-hand side, it can select the unique alternate that derives a string beginning with the input symbol. Definition : the grammar G is said to be strong LL(1) if it satisfies the strong LL condition. ex) G : S aSA | A c LOOKAHEAD(S aSA) = {a} LOOKAHEAD(S ) = FOLLOW(S) = {$, c} LOOKAHEAD(S aSA) LOOKAHEAD(S ) = G는 strong LL(1)이다.
Compiler Lecture Note, LL Parsing ▶ Implementation of Recursive-descent parser If a grammar is strong LL(1), we can construct a parser for sentences of the grammar using the following scheme. a ∈ VT, procedure pa; (* get_nextsymbol=scanner *) begin if nextsymbol = qa then get_nextsymbol else error end; get_nextsymbol : 스캐너에 해당하는 루틴으로 입력 스트림으로부터 토큰 한 개를 읽어 변수 nextsymbol에 할당하는 일을 한다.
Text p.278 Compiler Lecture Note, LL Parsing A ∈ VN, procedure pA; var i: integer; begin case nextsymbol of LOOKAHEAD(A X1X2...Xm): for i := 1 to m do pXi; LOOKAHEAD(A Y1Y2...Yn): for i := 1 to n do pYi; : LOOKAHEAD(A Z1Z2...Zr): for i := 1 to r do pZi; LOOKAHEAD(A ): ; otherwise: error end (* case *) end;
Compiler Lecture Note, LL Parsing ▶ Improving the efficiency and structure of recursive-descent parser 1) Eliminating terminal procedures ::= In practice it is better not to write a procedure for each terminal. Instead the action of advancing the input marker can always be initiated by the nonterminal procedures. In this way many redundant tests can be eliminated. ex) text p.279 [예9] 2) BNF EBNF : reduce the number of productions and nonterminals. ① repetitive part : { } ② optional part : [ ] ③ alternation : ( | )
Compiler Lecture Note, LL Parsing ex) [예 10] --- text p.281 < IF_st > ::= 'if ' < C > ' then ' < S > [ 'else ' < S > ] procedure pIF; begin if nextsymbol = qif then begin get_nextsymbol; pC; if nextsymbol = qthen then begin get_nextsymbol; pS end else error(10) end else error(20); if nextsymbol = qelse then begin get_nextsymbol; pS end end;
Compiler Lecture Note, LL Parsing ex) [예 11] --- text p.281 <id_list> ::= ' id ' { ' , ' ' id ' } procedure pID_LIST; begin if nextsymbol = qid then begin get_nextsymbol; while (nextsymbol = qcomma) do begin get_nextsymbol; if nextsymbol = qid then get_nextsymbol else error end end end;
Compiler Lecture Note, LL Parsing [연습문제 7.8 (2)] --- Text p.300 <문제> 다음 grammar를 extended BNF로 바꾸고 그에 따른 recursive-descent parser를 위한 procedure를 작성하시오. <D> ::= ' label ' <L> | ' integer ' <L> <L> ::= <id> <R> <R> ::= ' ; ' | ' , ' <L> <L> <id> (' , ' <id> )*' ; ' <D> ::= ( ' label ' | ' integer ' ) <id> {' , ' <id>} ' ; ' *
Compiler Lecture Note, LL Parsing procedure pD; begin if nextsymbol in [qlabel,qinteger] then begin get_nextsymbol; if nextsymbol = qid then begin get_nextsymbol; while (nextsymbol = qcomma) do begin get_nextsymbol; if nextsymbol = qid then get_nextsymbol else error(3) end end else error(2); if nextsymbol = qsemi then get_nextsymbol else error(4) end else error(1) end;
Compiler Lecture Note, LL Parsing Programming Assignment #1 Implement a recursive-descent syntax analyzer for the grammar given in exercise 5.30(text p. 224). Problem Specifications - input : SPL program to find a Minimum and a Maximum. - output : left parse - methods : (1) write the get_nextsymbol routine. (2) compute LOOKAHEADs for each production. (3) create a procedure for each nonterminal. (4) assemble the procedures with main program. a set of productions LOOKAHEADs for each nonterminal Computation of LOOKAHEADs
$ : input $ Driver routine output Table stack Compiler Lecture Note, LL Parsing III. Predictive Parsing ▶ Predictive parsing ::= a deterministic parsing method using a stack. The stack contains a sequence of grammar symbols. ▶ Model of a predictive parser
Compiler Lecture Note, LL Parsing Current input symbol과 stack top symbol 사이의 관계에 따라 parsing. The input buffer contains the string to be parsed, followed by $. Initial configuration : STACK INPUT $S $ Parsing table(LL) : parsing action을 결정지어 줌. ※ M[X,a] = r : stack top symbol이 X이고 current symbol이 a일 때, r번 생성 규칙으로 expand. terminals a r nonterminals X
Compiler Lecture Note, LL Parsing ▶ Parsing Actions X : stack top symbol, a : current input symbol 1. if X = a = $, then accept. 2. if X = a, then pop X and advance input. 3. if X ∈ VN, then if M[X,a] = r (X), then replace X by else error.
Text p.284 Compiler Lecture Note, LL Parsing ▶ Predictive parsing algorithm set ip to point to the first symbol of $; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error(1) else /* X is nonterminal */ if M[X,a] = X Y1Y2...Yk then begin pop X from the stack; push YkYk-1,...,Y1 onto the stack, with Y1 on top; output the production X Y1Y2...Yk end else error(2) until X = $ /* stack is empty */
Compiler Lecture Note, LL Parsing ex) G : 1. S aSb 2. S bA 3. A aA 4. A b string : aabbbb • Parsing Table: terminals a b nonterminals S 1 2 A 3 4
Compiler Lecture Note, LL Parsing STACK INPUT ACTIONS OUTPUT $S aabbbb$ expand 1 1 $bSa aabbbb$ pop a and advance $bS abbbb$ expand 1 1 $bbSa abbbb$ pop a and advance $bbS bbbb$ expand 2 2 $bbAb bbbb$ pop b and advance $bbA bbb$ expand 4 4 $bbb bbb$ pop b and advance $bb bb$ pop b and advance $b b$ pop b and advance $ $ Accept ※ How to construct a predictive parsing table for the grammar.
VT a VN X Compiler Lecture Note, LL Parsing VI. Predictive 파싱 테이블의 구성 ▶ main idea : If A is a production with a in FIRST(), then the parser will expand A by when the current input symbol is a. And if *, then we should again expand A by when the current input symbol is in FOLLOW(A). ▶ parsing table(LL): M[X,a] = r : expand X with r-production blank : error
Compiler Lecture Note, LL Parsing ▶ Algorithm : for each production A, 1. a ∈ FIRST(), M[A,a] := <A> 2. if *, then b ∈ FOLLOW(A), M[A,b] := <A>. ex) G: 1. E TE' 2. E' +TE' 3. E' 4. T FT' 5. T' FT' 6. T' 7. F (E) 8. F id FIRST(E)=FIRST(T)=FIRST(F)={ ( , id } FIRST(E')={ + , } FIRST(T')={ , } FOLLOW(E) = FOLLOW(E') = { ) , $ } FOLLOW(T) = FOLLOW(T') = { + , ) , $ } FOLLOW(F) = { + , , ) , $ }
Compiler Lecture Note, LL Parsing • Parsing Table: Terminals id + * ( ) $ Nonterminals E 1 1 E' 2 3 3 T 4 4 T' 6 5 6 6 F 8 7
Compiler Lecture Note, LL Parsing ▶ LL(1) Grammar ::= a grammar whose parsing table has no multiply-defined entries. multiply 정의되면 어느 rule로 expand해야 할 지 결정할 수 없기 때 문에 deterministic하게 parsing할 수 없다. ▶ LL(1) condition: A | , 1. FIRST() FIRST() = . 2. if , then FOLLOW(A) FIRST() = . ex) G : 1. S iCtSS' 2. S a 3. S' eS 4. S' 5. C b FIRST(S) = {i,a} FOLLOW(S) = {$,e} FIRST(S') = {e, } FOLLOW(S') = {$,e} FIRST(C) = {b} FOLLOW(C) = {t} *
Compiler Lecture Note, LL Parsing Parsing Table: M[S',e] := <3,4>로 중복으로 정의되었음. 여기서, stack top이 S'이고 input symbol이 e일 때 3번 rule로 expand해야 할 지, 4번 rule로 expand해야 하는지 알 수 없다. 그러므로 G는 LL(1) grammar가 아니다. ex) [예제15] --- text p.291 G : S aA | abA : abab A Ab | a a b e i t $ S 2 1 S' 3,4 4 C 5
Compiler Lecture Note, LL Parsing V. Strong LL(k) and LL(k) Grammars ▶ FIRSTk() = {| *, || = k or and || < k} ▶ G is said to be strong LL(k), for some fixed integer k > 0, if whenever there are two leftmost derivations. 1. S *A*x∈ VT*, and 2. S *A*y∈ VT* such that 3. FIRSTk(x) = FIRSTk(y). It follows that 4. = . ▶ Meaning: Suppose we consider any state of the parse in which A is the nonterminal currently being parsed and FIRSTk(x) is the k-lookahead at the current point. Then, if the k-lookahead is same, the two productions A and A are identical. Any other information provided by the closed portion and the open portion of the current state of the parse will be disregarded.
Compiler Lecture Note, LL Parsing ▶ S A, : closed portion, : open portion ▶ Two states of the parse FIRSTk(x) = FIRSTk(y) ===> = . * S S A A x y
Compiler Lecture Note, LL Parsing ▶ Def) LL(k) grammar: 1. S Ax ∈ VT*, and 2. S Ay ∈ VT* such that 3. FIRSTk(x) = FIRSTk(y). It follows that 4. = . ex) S aAaa | bAba A b | S S a A a a b A b a b lookahead가 ba일 때 A b, A 중 어느 rule을 택할 수 있는가? 이제 본 symbol이a이면 A b를 선택하고, b이면 A 를 선택한다. 따라서 SLL(2)는 아니며 LL(2)가 된다. * * * *
LL(k) SLL(k) Compiler Lecture Note, LL Parsing ▶ SLL(k) and LL(k) ▶ <theorem> strong LL(1) LL(1) Proof) () clear! () Suppose that G is not strong LL(1). Then, by definition, there are two distinct productions A and A such that, S 1A111111111 S 2A222222222 and FIRST(11) = FIRST(22). * * * * * *
Compiler Lecture Note, LL Parsing Now we must prove that G is not LL(1). 1) 1= 2= , G is not LL(1). Indeed, it is ambiguous. 2) one (or both) of 1 and 2 is not . 1. FIRST1(1 1) = FIRST1(1) = FIRST1(2 2). but then, S 2A2222 12212 S 2A2222 22222 satisfy the property FIRST1(1 2) = FIRST1(1) = FIRST1(2 2). Thus, by definition, G is not LL(1). * * * * * *