340 likes | 628 Views
Week 3. Questions / Concerns What’s due: Lab1b due Friday at midnight Lab1b check-off next week (schedule will be announced on Monday) Homework #2 due next Monday (Draw a parse tree) Homework #3 due next Wednesday (Define grammar for your language)
E N D
Week 3 • Questions / Concerns • What’s due: • Lab1b due Friday at midnight • Lab1b check-off next week (schedule will be announced on Monday) • Homework #2 due next Monday (Draw a parse tree) • Homework #3 due next Wednesday (Define grammar for your language) • Homework #4 due next Thursday (Grammar modifications) • Top down parser • Grammar modifications
Structure of Compilers skeletal source program preprocessor Modified Source Program Syntax Analysis (Parser) Lexical Analyzer (scanner) Tokens Syntactic Structure Semantic Analysis Intermediate Representation Optimizer Symbol Table Code Generator Target machine code
Parser • Choose a type of parser • Top-Down parser • Bottom-Up parser • Choose a parsing technique • Recursive Descent • Table driven parser (LL(1) or LR(1)) • Generate a grammar for your language • Modify the grammar to fit the particular parsing technique • Remove lambda productions • Remove unit productions • Remove left recursion • Left factor the grammar
Parser • Parser is just a matching tool • It matches list of tokens with grammar rules to determine if they are legal constructs/statements or not. • Yes/No machine • Context-Free • It doesn’t care about context (types), it just cares about syntax • If it looks like an assignment statement, then it is an assignment statement. int x; x = “Hello”;
Grammar #1 S -> aaSc| B B -> bbbB | Generate a parse tree for the input string aaaabbbcc
Grammar #2 S -> E E -> E + E E -> E * E E -> a |b | c Generate a parse tree for the input string a + b * c
Grammar #3 • Lua Grammar
Grammar • Two formats • Context-Free Grammar • Extended Backus-Naur Form Lua Example laststat ::= return [explist] | break Laststat -> return LaststatOptional | break LaststatOptional -> Explist | varlist ::= var {`,´ var} Varlist -> Var Varlist2 Varlist2 -> `,´ Var Varlist2 |
Grammar • Two formats • Context-Free Grammar • Extended Backus-Naur Form Mini C example Program = Definition { Definition } program -> Definition MoreDefinitions MoreDefinitions -> Definition MoreDefinitions | Definition = Data_definition | Function_definitionDefinition -> Data_definition | Function_definition Function_definition = ['int'] Function_header Function_body Function_definition -> OptionalType Function_header Function_body OptionalType -> ‘int’ |
Top-down parser • Start with start symbol of the grammar. • Grab an input token and select a production rule. • Use “stack” to store the production rule. • Try to parse that rule by matching input tokens. • Keep going until all of the input tokens have been processed. • If the rule is not the right one, put all the tokens back and try a different rule. (backtracking)
Top-down Parser • Ideal grammar: • Unique rule for each type of token. • One-token look ahead
One token look ahead Stat -> localfunction Name Funcbody | local Namelist LocalOptional • Based on one token “local” we should be able to pick one unique rule so we don’t have to backtrack. • What if we could combine these 2 rules into one rule by factoring out the common parts, it would eliminate the need for backtracking.
One token look ahead Stat -> localfunction Name Funcbody | local Namelist LocalOptional • Left factor the grammar: Stat -> localMorelocal Morelocal -> function Name Funcbody | Namelist LocalOptional
Top-down Parser • Ideal grammar: • Unique rule for each type of token. • One-token look ahead • Minimize unit productions • Unit productions don’t parse tokens immediately. It requires another production. • It’s hard to tell which tokens match the unit productions thus more chances for backtracking.
Minimize Unit Productions S -> aaSc S -> B B -> bbbB B -> S B b b b B
Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp
Remove Unit Productions S -> aaSc S -> B B -> bbbB B -> S -> aaSc S -> bbbB S -> B -> bbbB B ->
Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Functioncall| Prefixexp | { Fieldlistoptional }| Exp Binop Exp | Unop Exp
Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Prefixexp Args | Prefixexp `:´ Name Args | Prefixexp | { Fieldlistoptional } | Exp Binop Exp | Unop Exp
Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Prefixexp Args | Prefixexp `:´ Name Args | Prefixexp | { Fieldlistoptional } | Exp Binop Exp | Unop Exp More left factoring needed
Top-down Parser • Ideal grammar: • Unique rule for each type of token. • One-token look ahead • Minimize unit productions • Unit productions don’t parse tokens immediately. It requires another production. • It’s hard to tell which tokens match the unit productions thus more chances for backtracking. • Lambda productions are okay but we have to process them accordingly. • Removing lambdas always add more rules. • It’s not possible to remove all lambda productions and still yield unique token-rule matching. • Remove left recursion in the grammar.
Grammar (left recursive vs. right recursive) Right Recursion A -> aA A -> Left Recursion A -> Aa A -> Same grammar? A A A a a A a A a A Only non-recursive rule is a A a A
Grammar (left recursive vs. right recursive) Which one works for top down? A -> Aa A -> A -> aA A -> A A A a a A a A a A a A a A
Grammar (left recursive vs. right recursive) Same grammar? A -> Aa A -> b A -> aA A -> b A A A a a A a A a A Non-recursive rules are not only a A a A b b
Remove Left Recursion in the Grammar • Example: A -> Aa A -> b • Step 1: Make all left recursive rules right recursive, but give them a new non-terminal A -> Aa X -> aX • Step 2: Add a lambda production to the new non-terminal X -> • Step 3: Identify all non-recursive rules. A -> b • Step 4: Append the new non-terminal to the end of all non-recursive rules • A -> bX A -> A… Left Recursive rule
Grammar (left recursive vs. right recursive) Same grammar? A -> bX X -> aX | A -> Aa A -> b A A b X A a a X a A Non-recursive rules are not only a X a A a b
Remove Left Recursion S -> Sab S -> c S -> d X -> abX X -> S -> cX S -> dX
Remove Left Recursion PARAMLIST -> IDLIST : TYPE | PARAMLIST ; IDLIST : TYPE PARAMLIST2 -> ; IDLIST : TYPE PARAMLIST2 PARAMLIST2 -> PARAMLIST -> IDLIST : TYPE PARAMLIST2
Remove Unit Production Example S -> abSc S -> A S -> AB A -> aA A -> B -> bbB B -> S -> abSc S -> aA S -> S -> AB A -> aA A -> B -> bbB B ->
Remove Unit Production Example TERM -> FACTOR FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR TERM -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR
Left Factor Example S -> abS S -> aaA S -> a A -> bA A -> S -> aX X -> bS X -> aA X -> A -> bA A ->
Left Factor Example EXPRESSION -> SIMPLE_EXPR | SIMPLE_EXPR relop SIMPLE_EXPR EXPRESSION -> SIMPLE_EXPR RestOfExp RestOfExp -> | relop SIMPLE_EXPR
In-Class Exercise #5 • Remove Unit Production S -> abS | bSa | A | d A -> c | dA • Left Factor this grammar FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR • Remove Left recursion: SIMPLE_EXPR -> TERM | SIGN TERM | SIMPLE_EXPR addop TERM