580 likes | 670 Views
Compilation 0368-3133 (Semester A, 2013/14). Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky. Slides credit: Roman Manevich , Mooly Sagiv , Eran Yahav. What is a Compiler?.
E N D
Compilation 0368-3133 (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky Slides credit: Roman Manevich, MoolySagiv, EranYahav
What is a Compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia
Source text txt Executable code exe Conceptual Structure of a Compiler Compiler Frontend Semantic Representation Backend LexicalAnalysis Syntax Analysis Parsing Semantic Analysis IntermediateRepresentation (IR) Code Generation words sentences
Op(*) Op(+) Id(b) Num(23) Num(7) From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream Grammar: E ... | Id Id ‘a’ | ... | ‘z’ Parser valid syntaxerror Abstract Syntax Tree
Top-Down vs Bottom-Up AaAb|c aacbb$ A already read… to be read… a b A a A c b Top-down (predict match/scan-complete )
Top-Down vs Bottom-Up AaAb|c aacbb$ already read… to be read… c • Bottom-up (shift reduce) A a b A a A b Top-down (predict match/scan-complete )
Model of an LR parser Remainder of text to be processed Input • State controls decisions Top LR Parser state Output (AST/Error) symbol Terminals and Non-terminals Initial stack contains q0 Stack Control Tables
LR(0) parser tables Empty cell =error move ACTION Table GOTO Table
LR(0) parser tables • Shift action row • Tells which state to GOTO for current token • Blank entry indicates an error • Reduce action row • Tells which rule to reduce with • Independent of current token • GOTO entries are blank
Shift Move shift id + id $ + id $ input input stack stack q0 q0 id q5 Remove first token from input Push it on the stack Compute next state based on GOTO table Push new state on the stack If new state is error – report error
Reduce Move (using Nα) Reduce:T id + id $ + id $ input input stack stack q0 id q5 q0 T Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N
Reduce Move (using Nα) Reduce:T id + id $ + id $ input input stack stack q0 id q5 q0 T q6 Symbols in α and their following states are removed from stack New state computed based on GOTO table (using top of stack, before pushing N) N is pushed on the stack New state pushed on top of N
GOTO/ACTION table Z E $ E T E E + T T i T ( E ) Warning: numbers mean different things! rn = reduce using rule number n sm = shift to state m
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$ Initialize with state 0
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$ Initialize with state 0
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$ pop id 5
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$ push T 6
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
(1) S E $ (2) E T (3) E E+ T (4) T id (5) T (E) Parsing id+id$
Constructing an LR parsing table • Construct a transition diagram (deterministic FSM) • States = sets of LR(0) items • Transitions = one-step derivation • If there are conflicts – stop • Fill table entries from diagram
Terminology: Reductions & Handles • The opposite of derivation is called reduction • Let Aα be a production rule • Derivation: βAµβαµ • Reduction: βαµβAµ • A handleis the reduced substring • αis the handles for βαµ
LR(0) Items Grammar LR(0) items (1) S E $ (2) E T (3) E E + T (4) T id (5) T (E ) The items of a grammar are obtained by placing a dot at every position in every production
LR(0) Item - Intuition To be matched Already matched Input N αβ Hypothesis about αβ is the rule being reduced, and so far we’ve matched α and we expect to see β
Types of LR(0) items N αβ Shift Item N αβ Reduce Item
LR(0) automaton example reduce state q6 shift state E T T T q7 q0 T (E) E T E E + T T i T (E) Z E$ E T E E + T T i T (E) ( q5 i i T i E E ( ( i q1 q8 q3 Z E$ E E+ T T (E) E E+T E E+T T i T (E) + + $ ) q9 q2 Z E$ T (E) T q4 E E + T
Computing item sets • Initial set • Z is in the start symbol • -closure({ Zα | Zα is in the grammar } ) • Next set from a set S and the next symbol X • step(S,X) = { NαXβ | NαXβ in the item set S} • nextSet(S,X) = -closure(step(S,X))
Operations for transition diagram construction Initial = {S’S$} For an item set IClosure(I) = Closure(I) ∪ {Xµ is in grammar| NαXβ in I} Goto(I, X) = { NαXβ | NαXβ in I}
Initial example Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Initial = {S E $}
Closure example Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Initial = {S E $} Closure({S E $}) = { S E $ E T E E + T T id T ( E ) }
Goto example Grammar (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) Initial = {S E $} Closure({S E $}) = { S E $ E T E E + T T id T ( E ) } Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}
Constructing the transition diagram • Start with state 0 containing itemClosure({S E $}) • Repeat until no new states are discovered • For every state p containing item set Ip, and symbol N, compute state q containing item setIq = Closure(goto(Ip, N))
LR(0) automaton example reduce state shift state q6 E T T T q7 q0 T (E) E T E E + T T i T (E) Z E$ E T E E + T T i T (E) ( q5 i i T i E E ( ( i q1 q8 q3 Z E$ E E+ T T (E) E E+T E E+T T i T (E) + + $ ) q9 q2 Z E$ T (E) T q4 E E + T
Automaton construction example (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q0 S E$ Initialize
Automaton construction example (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q0 S E$ E T E E + T T i T (E) applyClosure
Automaton construction example (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q6 E T T q0 T (E) E T E E + T T i T (E) S E$ E T E E + T T i T (E) ( q5 i T i E q1 S E$ E E+ T
Automaton construction example (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) q6 E T q7 T q0 T T (E) E T E E + T T i T (E) S E$ E T E E + T T i T (E) non-terminal transition corresponds to goto action in parse table ( q5 i i T i E E ( ( i q1 q8 q3 Z E$ E E+ T T (E) E E+T E E+T T i T (E) terminal transition corresponds to shift action in parse table + + $ ) q9 q2 S E$ T (E) T q4 E E + T a single reduce item corresponds to reduce action
Are we done? Can make a transition diagram for any grammar Can make a GOTO table for every grammar Cannot make a deterministic ACTION table for every grammar
LR(0) conflicts … T q0 Z E$ E T E E + T T i T (E) T i[E] ( … q5 i T i T i[E] E Shift/reduce conflict … Z E $ E T E E + T T i T ( E ) T i[E]
LR(0) conflicts T q0 Z E$ E T E E + T T I V i T (E) T i[E] … ( … q5 i T i V i E reduce/reduce conflict … Z E $ E T E E + T T i V iT ( E )
LR(0) conflicts • Any grammar with an -rule cannot be LR(0) • Inherent shift/reduce conflict • A – reduce item • P αAβ – shift item • A can always be predicted from P αAβ
Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items
LR variants • LR(0) – what we’ve seen so far • SLR(0) • Removes infeasible reduce actions via FOLLOW set reasoning • LR(1) • LR(0) with one lookahead token in items • LALR(0) • LR(1) with merging of states with same LR(0) component
LR (0) GOTO/ACTIONS tables GOTO table is indexed by state and a grammar symbol from the stack ACTION Table GOTO Table ACTION table determined only by state, ignores input
SLR parsing • A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N • A reduce item N α is applicable only when the lookahead is in FOLLOW(N) • If b is not in FOLLOW(N) we just proved there is no derivation S * βNb. • Thus, it is safe to remove the reduce item from the conflicted state • Differs from LR(0) only on the ACTION table • Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take
Lookahead token from the input SLR action table vs. SLR – use 1 token look-ahead LR(0) – no look-ahead … as before… T i T i[E]