Chapter 4.4 Bottom-Up Parsing 自底向上的分析

Chapter 4.4Bottom-Up Parsing自底向上的分析 4.4.1 OVERVIEW OF BOTTOM-UP PARSING 自底向上分析概述 4.4.2 FINITE AUTOMATA OF LR(0) ITEMS AND LR(0) PARSING LR(0)项的有穷自动机与LR(0) 分析 4.4.3 SLR(1) Parsing SLR(1)分析 4.4.4 General LR(1) and LALR(1) Parsing 一般的LR(1)和LALR(1)分析

●自底向上分析的一般形式： $ Inputstring$ … … … … $ StartSymbol $ accept 4.4.1 OVERVIEW OF BOTTOM-UP PARSING 自底向上分析概述 ● A bottom-up parser has two possible actions (besides "accept") • Shift（移进）a terminal from the front of the input to the top of the stack. • Reduce（归约）a string α at the top of the stack to a nonterminal A, given the BNF choice A→α.

解： (1)拓广文法 S' → S S → (S)S|ε 分析栈输入动作 ●example 1)已知G: S→(S) S∣ε,请应用自底向上分析方法解() L(G)? S’=>S [S' → S] =>(S)S [S→(S)S] =>(S) [S→ε] =>( ) [S→ε] (2)分析过程 1 $ () $ 移进 2 $( ) $ S→ε规约 3 $(S ) $ 移进 4 $(S) $ S→ε规约 5 $(S)S $ S → (S)S规约 6 $S $ S’→ S规约 7 $S’ $ 接受

解： (1)拓广文法 E’→E E→E + n | n 分析栈输入动作 ●example 2)已知G: E→E + n | n ,请应用自底向上分析方法解n+n L(G)? E’=>E [E' → E] =>E+n [E→E+n] =>n+n [E→n] (2)分析过程 1 $ n+n$ 移进 2 $n +n$ E→n规约 3 $E +n$ 移进 4 $E+ n$ 移进 5 $E+n $ E→E+n规约 6 $E $ E’→E规约 7 $E’ $ 接受

解： (1)拓广文法 S’→S S  aA A  cA | d ●example • 3)已知文法G[S],请应用自底向上分析方法判断accdL(G)? • S  aA • A  cA | d (2)有穷自动机 c A AcA. s4 Ac.A A.cA A.d s2 d S’S. s Ad. s3 c S d Sa.A A.cA A.d s1 start a S’→S S.aA s0 A SaA. s5

(2)有穷自动机 解： (1)拓广文法 Ac.A A.cA A.d s2 A AcA. s4 c d S’S. s S’→S S  aA A  cA | d Ad. s3 c S d Sa.A A.cA A.d s1 start S’.S S.aA s0 a A SaA. s5 (3)分析过程分析栈输入动作 1 $s0 accd$ shift 2 $ s0as1 ccd$ shift 3 $ s0as1cs2 cd$ shift 4 $ s0as1cs2cs2 d$ shift 5 $s0as1cs2cs2ds3$ reduce Ad 6 $s0as1cs2cs2As4 $ reduce AcA 7 $ s0as1cs2As4$ reduce AcA 8 $ s0as1As5$ reduce SaA 9 $ s0S $ reduce S’S 10 $ S’ $ accept

● 相关术语 1) the right sentential form 右句型 S’=>S =>(S)S =>(S) => ( ) 推导中的终结符和非终结符的每个中间串称为右句型。 2) 右句型的可行前缀 viable prefix（活前缀）当前栈和输入串之间发生了间隔，例E || +n, E+ || n,…, 在每种情况下，分析栈的符号序列被称为右句型的可行前缀。 ∴E, E+, E+n都是右句型E+n的可行前缀。 3) The handle of the right sentential form 右句型的句柄这个串，在右句子格式中发生的位置以及用来规约它的产生式被称为右句型的句柄。

4.4.2 FINITE AUTOMATA OF LR(0) ITEMS AND LR(0) PARSING LR(0)项的有穷自动机与LR(0) 分析 4.4.2.1 LR(0) ITEMS LR(0)项 1) LR(0) ITEMS LR(0)项 a production choice with a distinguished position in its right-hand side. Example if A → αis a production choice, and if β and Y are any two strings of symbols (including the empty string s) such that βγ = α, then A→β·γis an LR(0) item.

解：解： This grammar has three production choices and eight items: E’→·E E’→E· E→·E + n E→E· + n E→E +· n E→E + n· E→·n E→n· This grammar has three production choices and eight items: S' → ·S S' → S· S → ·(S)S S → (·S)S S → (S·)S S → (S)·S S → (S)S· S → · ●example • 1)已知G，求其项目。 • S' → S • S → (S)S|ε • 2)已知G，求其项目。 • E’→E • E→E + n | n

2) 有效项目  ：伽马 ：艾塔项目A1.2对活前缀  = 1 是有效的（存在规范推导 S =>*Aw => 12w）。若项目 A1.B2 对活前缀  = 1 是有效的，且 B 是产生式，则项目 B  . 对活前缀  = 1 也是有效的。 ● 相关术语 1) 项目在文法产生式右部某个位置标有‘.’ 的产生式，称为文法的一个LR(0)项目。形如 A .  的项目称为初始项目；形如 A . 的项目称为归约项目（完整项目）；形如 A . B 的项目称为待约项目(基本项目) B∈N；形如 A . a 的项目称为移进项目(基本项目) a∈T。

● 相关术语 3)有效项目集，项目集规范族文法G的某个活前缀的所有有效项目组成的集合，称为活前缀的LR(0)有效项目集。文法G的所有有效项目集组成的集合，称为G的LR(0)项目集规范族。

● 相关术语 4)项目闭包设I是文法G的一个LR(0)项目集合，I的项目闭包closure(I)定义如下： (1) I  closure(I)。(2) 若项目A   . B  closure(I)，且 B   是G的产生式，则项目B  .  closure(I)。(3) closure(I)仅包含上述两条规则确定的LR(0)项目。 5)转移函数若I是文法G的一个LR(0)项目集，X是G中的文法符号。 go(I, X) = closure(J) 其中J ={AX .  | A . XI } 称函数go(I, X)为转移函数。项目A   X . 称为项目A   . X后继。

● 相关术语 6) 识别G的句柄的自动机若文法G = ( VT, VN, S, P)，则识别G的句柄的自动机为DFA M = (  = VTVN， Q = G的LR(0)项目集规范族, q0 = closure( {S.S} )， F = 所有含归约项目的有效项目集组成的集合，  = go(I,X) )。

4.4.2.2 Finite Automata of Items 项目的有穷自动机 ● LR(0)项的NFA的转换 X A→α·Xη A→αX · η ε X→β A→α·Xη X→·β

S S'→S· ε ε eight items eight states NFA S→·(S)S S→ · ( ε ε ε S S ) ε ●example • 1)已知G，求其DFA。 • S' → S • S → (S)S|ε 解： S' → ·S S' → S· S → ·(S)S S → (·S)S S → (S·)S S → (S)·S S → (S)S· S → · S'→·S S→(S)S· S→(·S)S S→(S·)S S→(S)·S

S'→·S S→(S)S· S S'→S· ε S→(·S)S S→(S·)S ε S→(S)·S S→·(S)S S→ · S ( ε ε ε S S ) ( S ) ε ( S ( NFA S'→·S S→·(S)S S→ · S'→S· 1 DFA S→(S·)S 0 3 S→(·S)S S→·(S)S S→ · S→(S)·S S→·(S)S S→ · S→(S)S· 5 2 4

E E'→·E E'→E· ε ε ε n E→·E+n E→·n E→n· eight items eight states n + NFA E ε E→E·+n E→E+·n E→E+n· E'→·E E→·E+n E→·n E E'→E· E→E· +n n 1 0 + n E→n· E→E+·n E→E+n· 2 3 4 ●example • 2)已知G，求其DFA 。 • E’→E • E→E + n | n 解： E’→·E E’→E· E→·E + n E→E· + n E→E +· n E→E + n· E→·E E→E· DFA

解: (1)识别文法活前缀的DFA 4 SaA. 5 SS. Ac.A A.cA A.d 1 A A S 10 AcA. c S.S S.aA S.bB start Sa.A A.cA A.d c a d d 6 Ad. 0 b 2 7 SbB. B 8 Bc.B B.cB B.d Sb.B B.cB B.d 11 c B BcB. 闭包项 d 3 c d 9 Bd. 例：已知拓广文法G[S]，求其LR(0)的分析表。S  S S  aA | bB A  cA | d B  cB | d 核心项所有的闭包项都是初始项

4 SaA. SS. A 5 Ac.A A.cA A.d 1 S A c start c S.S S.aA S.bB Sa.A A.cA A.d a d d goto action 6 Ad. (2)LR(0)分析表状态 0 b 2 a b c d $ S A B SbB. 7 B Sb.B B.cB B.d 0 1 2 3 4 5 6 7 8 9 10 11 s2 s3 acc s5 s6 s8 s9 r1 r1 r1 r1 r1 s5 s6 r4 r4 r4 r4 r4 r2 r2 r2 r2 r2 s8 s9 r6 r6 r6 r6 r6 r3 r3 r3 r3 r3 r5 r5 r5 r5 r5 1 4 7 10 11 8 Bc.B B.cB B.d c B 3 d c d Bd. 9 11 BcB. 10 AcA. (1)识别文法活前缀的DFA 0 SS 1 SaA 2 SbB3 AcA 4 Ad 5 BcB 6 Bd

●LR分析器的结构和工作过程 输入 a1 ... ai ... an $ sm Xm sm-1 Xm-1 ... s0 栈 LR 驱动程序输出分析表 action goto

● The LR (0) parsing algorithmLR分析算法 Let s be the current state (at the top of the parsing stack).Then actions are defined as follows: 1. If state s contains any item of the form A → α·Xβ, where X is a terminal. Then the action is to shift the current input token on to the stack. If this token is X. and state s contains item A → α·Xβ, then the new state to be pushed on the stack is the state containing the item A → α·Xβ. If this token is not X for some item in state s of the form just described, an error is declared.

2. If state s contains any complete item (an item of the form A→γ·), then the action is to reduce by the rule A →γ·. (唯一性) A reduction by the rule S` → S, where s is the start state, is equivalent to acceptance, provided the input is empty, and error if the input is not empty.

● LR分析算法 输入：一个输入串w和文法G的一张LR分析表M。输出：若w L(G),输出w的一个自底向上的分析；否则，输出一个出错表示。方法：分别置放s0到栈中和w$到输入缓冲器中;置ip指向w$的第一个符号；repeat forever begin令s是栈顶状态且a是ip所指向的符号if action[s,a] = shift s then begin将a和s先后压入栈内；使ip指向输入串中的下一个符号；end else if action[s,a] = reduce A  then begin从栈顶弹出2*||个符号；令s是当前栈顶状态；把A和goto[s,A]先后入栈；输出产生式A  end else if action[s,a] = accept then return else error( ) end

● example 例：已知文法G[S]，求其LR(0)的分析表S  A | B, A -> aAb | c, B -> aBd | d 解: (1)拓广文法 S’->S S->A S->B A->aAb A->c B->aBd B->d 1) S' →.S 2) S'→S. 3) S→.A 4) S→A . 5) S→.B 6) S→B. 7) A→.aAb 8) A→a.Ab 9) A→aA.b 10) A→aAb . 11) A→.c 12) A→c. 13) B→.aBd 14) B→a.Bd 15) B→aB.d 16) B→aBd . 17)B→.d 18) B→d.

(2)识别文法的活前缀的 DFA 0 1 S S’S S’S S A 2 A S B SA 8 A aAb A aAb. 3 A c B S B B aBd b a Bd 4 7 A A aA.b A a.Ab c 5 c B a.Bd A  c. 9 d B A aAb B aB.d A c 6 d B aBd d B  d. 10 Bd B aBd. a

goto action (2)LR(0)分析表状态 a b c d $ S A B 0 1 2 3 4 5 6 7 8 9 10 s4 s5 s6 acc r1 r1 r1 r1 r1 r2 r2 r2 r2 r2 S4 s5 s6 R4 r4 r4 r4 r4 r6 r6 r6 r6 r6 s8 R3 r3 r3 r3 r3 s10 r5 r5 r5 r5 r5 1 2 3 7 9

● example 例：已知文法G[A]，求其LR(0)的分析表，并判断((a))∈L(G)? A  ( A ) | a 解: (1)拓广文法 (2)识别文法活前缀的DFA (3)LR(0)分析表 (4)分析过程

4.4.3 SLR(1) Parsing SLR(1)分析 ● The SLR(1) parsing algorithm Let s be the current state (at the top of the parsing stack). Then. actions are defined as follows: 1. If state s contains any item of form A → α·Xβ,where X is a terminal, and X is the next token in the input string, then the action is to shift the current input token onto the stack, and the new state to be pushed on the stack is the state containing the item A → α·Xβ.

2 If state s contains the complete item A → γ·, and the next token in the inupt string is in Follow(A), then the action is to reduce by the rule A → γ. A reduction by the rule S' →S, where s is the start state, is equivalent to acceptance; this will happen only if the next input token is $. In all other cases, the new state is computed as follows. Remove the siring Y and all of its corresponding states from the parsing stack. Correspondingly, back up in the DFA to the state from which the construction of γ began. By construction, this state must contain an item of the form B → α·Aβ. Push A onto the stack, and push the state containing the item B → αA·β. 3. If the next input token is such that neither of the above two cases applies, an error is declared .

● Conditions A grammar is an SLR(l) grammar if the application of the above SLR( 1 ) parsing rules results in no ambiguity. In particular, a grammar is SLR( 1) if and only if, for any state s, the following two conditions are satisfied: 1. For any item A → α·Xβin s with X a terminal, there is no complete item B → γ. in s with X in Follow(B). 2. For any two complete items A → α· and B →β· in s, Follow(A) ∩ Follow(B) is empty.

将b移进栈 将归约为A 将归约为B ● SLR(1)分析若有效项目集中存在冲突动作: I = { X   . b, A  . , B  . } 设当前输入符号为a, 1. 若a = b, 则移进; 2. 若aFollow(A), 则用A  进行归约; 3. 若aFollow(B), 则用B  进行归约; 4. 其余情况报错.

● SLR分析算法 • 输入：一个拓广文法G • 输出：对于G的分析表的action 子表和goto子表 • 方法： • 1. 构造G的LR(0)项目集规范族。 • 2. 对于状态Ii的分析动作如下： • (a) 若A . aB  Ii且 go (Ii ,a)=Ij • action[i,a] = shift j • (b) 若A .  Ii, 对于所有a  Follow(A) • action[i,a] = reduce A , A S • (c) 若SS.  Ii, action[i, $]= accept • 3. 若go(Ii, A) = Ij, AVN , 则 goto[i,A] = j • 4. 分析表其余位置为error

SLR(SLR(1))算法：如果文法G按上述算法构造出的分析表不存在冲突动作，则称G为SLR文法。类似地，不难定义LR(0)文法。SLR(SLR(1))算法：如果文法G按上述算法构造出的分析表不存在冲突动作，则称G为SLR文法。类似地，不难定义LR(0)文法。问题. 如何定义LR(0)文法？若将上述算法的2(b)步中的aFollow(A)改为aVT{$}，则由此修改后的算法所定义的文法，称为LR(0)文法。

● example • 例：已知文法G[E]，并用SLR(1)方法分析id*id+id∈L(G[E]) ? • EE+T | T • TT*F | F • F (E) | id • 解： (1)拓广文法 • G的拓广文法G[E]： • (0) E  E (4) TF • (1) EE+T (5) F (E) • (2) ET (6) F id • (3) TT*F • (2)识别文法的活前缀的 DFA • (3)SLR(1)分析表 • (4)分析过程

(2)识别文法的活前缀的 DFA T E E+ T  T T*F I9 I6 E E+  T T T*F T F F  (E) F id I1 + E’E E E+T F * I3 ( I7 id E I0 I4 I5 F  ( E) E E+T E T T T*F T F F  (E) F id I8 E ’E T I2 ( E F  (E ) E E+T + E E+T I6 F E T I3 T T*F id ) I5 ( T F I11 F  (E)  F  (E) I3 F F id T F I5 id id T I2 I10 F id F E T T T*F I7 T T* F  T T*  F F  (E) F id * ( G：(0) E  E (1) EE+T (2) ET (3) TT*F (4) TF (5) F (E) (6) F id I4

I1 T E E+ T  T T*F I9 I6 E E+  T T T*F T F F  (E) F id (2)识别文法的活前缀的 DFA + E’E E E+T F * ( I3 I7 id I0 I4 I5 E F  ( E) E E+T E T T T*F T F F  (E) F id I8 T E ’E I2 E F  (E ) E E+T + ( I6 E E+T F I3 E T id ) T T*F I5 ( T F I11 F  (E)  F  (E) I3 F T F F id I5 id id I10 T I2 F id F E T T T*F I7 T T* F  T T*  F F  (E) F id * ( I4

I1:E´E I2: E T  I9: E E+T  E E+T T T  *F T T  *F I={X  b , A  , B  } 若{b}FOLLOW(A)  FOLLOW(B)= 则，面对当前读入符号a，状态I的解决方法： 1. 若a=b,则移进。 2. 若a≠b, 且a FOLLOW(A),则用A 进行归约。 3. 若a≠b, 且a FOLLOW(B),则用B进行归约。 4. 此外，报错。这种解决方法是比较简单的，因此称作SLR 分析，由此构造的分析表，称作SLR分析表。

对于表达式文法的例子，FOLLOW集如下： G： (0) E  E (4) TF (1) EE+T (5) F (E) (2) ET (6) F id (3) TT*F I1:{ E’E EE+T} I2:{ET T T  *F} I9:{E E+T  T T *F} I1:FOLLOW(E’)∩{+}=Φ I2: FOLLOW(E)∩{*}=Φ I9: FOLLOW(E)∩{*}=Φ ∴可用SLR(1)方法实现

(3)SLR分析表 Follow(E)={$, +,)}

(4) id*id+id的LR分析过程 分析栈输入串动作 (1) 0 (2) 0id5 (3) 0F3 (4) 0T2 (5) 0T2*7 (6) 0T2*7id5 (7) 0T2*7F10 (8) 0T2 (9) 0E1 (10) 0E1+6 (11) 0E1+6id5 (12) 0E1+6F3 (13) 0E1+6T9 (14) 0E1 id*id+id$ *id+id$ *id+id$ *id+id$ id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $ shift reduce by Fid reduce by TF shift shift reduce by Fid reduce by TT*F reduce by ET shift shift reduce by Fid reduce by TF reduce by EE+T accept

● example 例：已知文法G[E]，求其SLR(1)的分析表，并判断n+n∈L(G)?E→E + n | n 解: (1)拓广文法 (2)识别文法活前缀的DFA (3)SLR(1)分析表 (4)分析过程

● example 例：已知文法G[S]，求其SLR(1)的分析表，并判断( )( )∈L(G)?S → (S)S|ε 解: (1)拓广文法 (2)识别文法活前缀的DFA (3)SLR(1)分析表 (4)分析过程

4.4.4 General LR(1) and LALR(1) Parsing 一般的LR(1)和LALR(1)分析 ● Definition of LR(1) transitions (part 1).Given an LR(1) item [A→α·Xγ,a], where X is any symbol (terminal or nonterminal ), there is a transition on X to the item [A→ αX·γ,a] (part 2). Given an LR(1) item [A→α·Bγ,a], where B is a nonterminal, there areε-transitions to items [B→·β,b] for every production B →βand for every token b in First(γa).

● example • 例：已知文法G[S]，求其LR(1)的分析表，并判断 id:=id∈L(G)? • S → id | V := E • V→ id • E → V | n 解: (1)拓广文法 (2)识别文法活前缀的DFA (3)LR(1)分析表 (4)分析过程

解: • (1) 拓广文法 • S’ → S • S → id • S → V := E • V→ id • E → V • E → n • (2) 识别文法活前缀的DFA 1 SS. ,$ 5 S→V:=E.,$ 0 S S’ → .S ,$ S → .id ,$ S → .V := E ,$ V→ .id ,:= 2 start id 6 Sid. ,$ Vid. ,:= E→V. ,$ E V S→V:=.E,$ E→.V ,$ E→.n ,$ V→ .id,$ V n := 7 E→n. ,$ S→V. :=E,$ 3 8 V→id. ,$ id 4

1 SS. ,$ 5 S→V:=E.,$ 0 S S’ → .S ,$ S → .id ,$ S → .V := E ,$ V→ .id ,:= 2 start id 6 Sid. ,$ Vid. ,:= E→V. ,$ E V S→V:=.E,$ E→.V ,$ E→.n ,$ V→ .id,$ V n := 7 E→n. ,$ S→V. :=E,$ 3 8 V→id. ,$ id 4 • (3) LR(1)分析表 Action goto id := n $ S V E 0 1 2 3 4 5 6 7 8 S2 1 3 ACC R3 R1 S4 S8 S7 S6 6 5 R2 R4 R5 R3 0 S’ → S 1 S → id 2 S → V := E 3 V→ id 4 E → V 5 E → n

Action goto id := n $ S V E (4) id:=id的LR分析过程 0 1 2 3 4 5 6 7 8 S2 1 3 ACC R3 R1 S4 S8 S7 S6 6 5 R2 R4 R5 R3 0 S’ → S 1 S → id 2 S → V := E 3 V→ id 4 E → V 5 E → n 分析栈输入串动作 (1) 0 (2) 0id2 (3) 0V3 (4) 0V3:=4 (5) 0V3:=4id8 (6) 0V3:=4V6 (7) 0V3:=4E5 (8) 0S1 id:=id $ :=id $ :=id $ id $ $ $ $ $ shift reduce by V → id shift shift reduce by V → id reduce by E → V reduce by S → V := E accept

● example 例：已知文法G[A]，求其LR(1)的分析表，并判断((a))∈L(G)? A  ( A ) | a 解: (1)拓广文法 (2)识别文法活前缀的DFA (3)LR(1)分析表 (4)分析过程

解: • (1) 拓广文法 • A’ A • A  ( A ) | a • (2) 识别文法活前缀的DFA 1 4 7 ) AA. ,$ A(A .),$ A(A) .,$ A A 0 2 5 start A.A ,$ A.(A),$ A.a ,$ A(.A),$ A.(A) ,) A.a ,) A(.A) ,) A.(A) ,) A.a ,) ( 8 ( A A(A .),) a a ( ) a 3 6 9 Aa. ,$ Aa. ,) A(A) .,)

Chapter 4.4 Bottom-Up Parsing 自底向上的分析