400 likes | 416 Views
Dive into LR(0) grammars and parsing computer programs in formal languages, exploring concepts like left-to-right parsing and the fastest parsing algorithm. Learn about items in parsing algorithms and build part of the parse tree.
E N D
Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages LR(0) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Parsing computer programs • First the javac compiler does a lexical analysis: if (n == 0) { return x; } if (ID == INT_LIT) { return ID; } ID = identifier (name of variable, procedure, class, ...) INT_LIT = integer literal (value) • The alphabet of java CFG consists of symbols like: S = {if, return, (, ){, }, ;, ==, ID, INT_LIT, ...}
Parsing computer programs if (n == 0) { return x; } Statement if (ID == INT_LIT) { return ID; } ifParExpressionStatement (Expression) Block Expression ExpressionRest { BlockStatements } BlockStatement Primary Infixop Expression Statement Identifier == Primary return Expression ; ID Literal Primary INT_LIT Identifier the parse tree of a java statement ID
CFG of the java programming language Identifier: ID QualifiedIdentifier: Identifier { . Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= … from http://java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996
Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { // Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } } … Simple java program: about 500 symbols
Parsing algorithms • How long would it take to parse this? • Can we parse faster? • No! CYK is the fastest known general-purposeparsing algorithm exhaustive algorithm about 1080 years (longer than life of universe) CYK algorithm about 1 week!
Another way of thinking Scientist: Find an algorithm thatcan parse strings inany grammar Engineer: Design your grammar so it has a very fastparsing algorithm
Parsing left to right S Tc(1) T TA(2) | A(3) A aTb(4) | ab(5) input: abaabbc S T A T a a b b c a b • • • • • • • • T A A Try to match to the left of •
Items • An item is a production augmented with a • • The item is complete if the • is the last symbol A aTb(4) S Tc(1) T TA(2) T A(3) A ab(5) A •aTb A a•Tb A aT•b A aTb• A •ab A a•b A ab• T •A T A• S •Tc S T•c S Tc• T •TA T T•A T TA•
Meaning of items S Tc(1) T TA(2) | A(3) A aTb(4) | ab(5) A a•Ab A a•b a a b b c a b • A aT•b T a a b b c a b • T A A Items represent possibilities at various stages of the parsing process
Meaning of items S Tc(1) T TA(2) | A(3) A aTb(4) | ab(5) A a•Ab A a•b a a b b c a b • A aT•b A aTb• A T T a a b b c a b • a a b b c a b • T T A A A A When a complete item occurs, a part of the parse tree is discovered
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A •ab A •aAb a a b b •
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A a•b A a•Ab a a b b • A •ab A •aAb
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A a•Ab valid items registry A a•b A a•Ab a a b b • A a•b A a•Ab
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A a•Ab valid items registry A a•b A a•Ab a a b b • A •ab A •aAb
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A a•Ab A a•Ab valid items registry A ab• A a•Ab a a b b • A •ab A •aAb
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A A a•Ab valid items registry A aA•b A ab• a a b b •
LR(0) parsing Move from left to right A aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A A valid items registry A aAb• a a b b •
LR(0) two kinds of actions A a•b A ab• A a•Ab A •aAb A •ab a a b b • • A no complete itemin registry exactly one complete item valid items registry valid items registry a a b b • shift reduce
LR(0) implementation: first take action valid items stack A aAb | ab S A •aAb A •ab A a•Ab A a•b A •aAb A •ab a S a a b b A A a•Ab A a•b A •aAb A •ab S aa aab R A ab• A S aA A aA•b R aAb A aAb• • • • • • A
valid item update rules a, b: terminals A, B, C: variables a, b, d: mixed strings notation initial valid items: S •a read a A aa•b A a•ab shift updates: read b disappear A a•ab read x disappear A a•Bb A aB•b A a•Bb reduce updates: reduce B disappear A a•Bb reduce C common updates: B •d A a•Bb
valid item update rules a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation initial valid items: S •a read a A aa•b A a•ab shift updates: A aB•b A aX•b A a•Bb A a•Xb reduce updates: reduce B X common updates: B •d A a•Bb
LR(0) parsing: NFA representation a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 S •a For every item S •a X A •X A X• For every item A •X e A •C C •d For every pair of items A •C, C •d
NFA example A •aAb A a•Ab A aA•b A aAb• q0 a b A A •ab A a•b A ab• A aAb | ab NFA alphabet is S = {a, b, A} a start state is q0 b other states are items
NFA to DFA conversion A •aAb A a•Ab A aA•b A aAb• q0 a a b b A A A •ab A a•b A ab• a A a•Ab A a•b A •aAb A •ab A aA•b A aAb• a A •aAb A •ab b b a, A qdie A ab• b, A qdie
Shift states and reduce states 3 2 1 5 4 b a A a A a•Ab A a•b A •aAb A •ab are shift states 1 2 3 A aA•b A aAb• A •aAb A •ab b are reduce states 4 5 A ab•
LR(0) parsing: take 2 4 3 2 1 5 b a A a a b b action state stack 1 S A aAb | ab a a 2 S A a•Ab A a•b A •aAb A •ab A aA•b A aAb• How do we know what to reduce to? aa 2 S A •aAb A •ab ? b aab 5 R A ab• • • • •
remember state in stack! 5 3 2 1 4 b a A a a b b action state stack 1 S A aAb | ab a A 1a 2 S A a•Ab A a•b A •aAb A •ab A aA•b A aAb• backtrack two steps 1a2a 2 S A •aAb A •ab b 1a2a2b 5 R A ab• 1a2A • • • •
remember state in stack! 4 5 3 2 1 a b A a a b b A action state stack 1 S A aAb | ab a A 1a 2 S A a•Ab A a•b A •aAb A •ab A aA•b A aAb• 1a2a 2 S A •aAb A •ab b 1a2a2b 5 R A ab• 3 S 1a2A • • 4 R 1a2A3b
PDA for LR(0) parsing 4 5 3 2 1 a b A a a b b A action state stack 1 S A aAb | ab a A 1 2 S A a•Ab A a•b A •aAb A •ab A aA•b A aAb• 12 2 S A •aAb A •ab b 122 5 R A ab• 3 S 12 • • • • • 4 R 123
PDA for LR(0) parsing •A e, e/$ 4 5 3 1 2 a b A A, $/e a A• A a•Ab A a•b A •aAb A •ab A aA•b A aAb• pop stateb, statea pop stateb, stateA, statea A •aAb A •ab b take A transition out of statea take A transition out of statea push statea push statea A ab•
Example 1 L = {w#wR : w ∈ {a, b}*} A aAa | bAb | # a a A e A •aAa A a•Aa A aA•a A aAa• e e e e # q0 A •# A #• e e e b b A e A b•Ab A bA•b A bAb• A •bAb
Example 1 A aAa | bAb | e stack state action e 1 S 4 2 6 1 3 5 1 2 S 12 2 S A a•Aa A •aAa 122 6 R a, b A b•Ab A •bAb 122 3 S A •aAa A •# A 1223 4 R A •bAb A 12 3 S # a, b # 123 5 R A •# A #• A a A aA•a 4 A bA•b A aAa• b A 4 A bAb• # b a a b • • • • • •
LR(0) grammars and deterministic PDAs • The PDA for LR(0) parsing is deterministic • Some CFLs require non-deterministic PDAs, e.g. • What goes wrong when we do LR(0) parsing on L? L = {wwR : w ∈ {a, b}*}
Example 2 L = {wwR : w ∈ {a, b}*} A aAa | bAb | e a a A e A •aAa A a•Aa A aA•a A aAa• e e e e q0 A • e e e b b A e A b•Ab A bA•b A bAb• A •bAb
Example 2 L = {wwR : w ∈ {a, b}*} A aAa | bAb | e A a•Aa A •aAa a, b A b•Ab A •bAb A •aAa A • A •bAb a, b A • A a A aA•a 4 A bA•b A aAa• input: abba b shift-reduce conflict 4 A bAb•
Parsing computer programs if (n == 0) { return x; } else { return x + 1; } Statement ifParExpressionStatement (Expression) Block Expression ExpressionRest { BlockStatements } BlockStatement Primary Infixop Expression Statement Identifier == Primary return Expression ; ID Literal Primary INT_LIT Identifier ID
Parsing computer programs if (n == 0) { return x; } else { return x + 1; } Statement ifParExpressionStatement elseStatement Block Block (Expression) ... ... ... LR(0) parsers cannot tell apart if ... then from if ... then ... else ID
When you can’t LR(0) parse • Algorithm can perform two actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict
Hierarchy of context-free grammars context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … to be continued… java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm