250 likes | 419 Views
Lab 3: Using ML-Yacc. Zhong Zhuang dyzz@mail.ustc.edu.cn. How to write a parser?. Write a parser by hand Use a parser generator May not be as efficient as hand-written parser General and robust How it works?. stream of tokens. Parser Specification. Parser. parser generator.
E N D
Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail.ustc.edu.cn
How to write a parser? • Write a parser by hand • Use a parser generator • May not be as efficient as hand-written parser • General and robust • How it works? stream of tokens Parser Specification Parser parser generator abstract syntax
ML-Yacc specification • Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax
ML-Yacc Definitions • specify type of positions %pos int * int • specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS ... %nonterm prog | exp | op • specify end-of-parse token %eop EOF • specify start symbol (by default, non terminal in LHS of first rule) %start prog
A Simple ML-Yacc File grammar symbols %% %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF %% exp : fact () | fact PLUS exp () fact : base () | base MUL factor () base : NUM () | LPAR exp RPAR () semantic actions (currently do nothing) grammar rules
each nonterminal may have a semantic value associated with it • when the parser reduces with (X ::= s) • a semantic action will be executed • uses semantic values from symbols in s • when parsing is completed successfully • parser returns semantic value associated with the start symbol • usually a syntax tree
to use semantic values during parsing, we must declare symbol types: • %terminal NUM of int | PLUS | MUL | ... • %nonterminal exp of int | fact of int | base of int • type of semantic action must match type declared for the nonterminal in rule
A Simple ML-Yacc File with Action grammar symbols with type declarations %% %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF %% exp : fact (fact) | fact PLUS exp (fact + exp) fact : base (base) | base MUL base (base1 * base2) base : NUM (NUM) | LPAR exp RPAR (exp) computing integer result via semantic actions grammar rules with semantic actions
Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read
Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is : E+(E*E) exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we shift To be read
Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is: (E+E)*E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we reduce To be read
This is a shift-reduce conflict • We want E+E*E, because “*” has higher precedence than “+” • Another shift-reduce conflict • Tokens from lexer • NUM PLUS NUM PLUS NUM • State of Parser • E+E • Result is : E+(E+E) and (E+E)+E If we shift To be read If we reduce
Deal with shift-reduce conflicts • This case, we need to reduce, because “+” is left associative • Deal with it! • let ML-Yacc complain. • default choice is to shift when it encounters a shift-reduce error • BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant • rewrite the grammar to eliminate ambiguity • can be complicated and less clear • use Yacc precedence directives • %left, %right %nonassoc
Precedence and Associativity • precedence of terminal based on order in which associativity is specified • precedence of rule is the precedence of the right-most terminal • eg: precedence of (E ::= E + E) == prec(+) • a shift-reduce conflict is resolved as follows • prec(terminal) > prec(rule) ==> shift • prec(terminal) < prec(rule) ==> reduce • prec(terminal) = prec(rule) ==> • assoc(terminal) = left ==> reduce • assoc(terminal) = right ==> shift • assoc(terminal) = nonassoc ==> report as error
datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp %% %left PLUS MINUS %left MUL DIV %% exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp) Higher precedence
Reduce-reduce Conflict • This kind of conflict is more difficult to deal with • Example • When we get a “word” from lexer, • word -> maybeword -> sequence (rule 1) • empty –> sequence word -> sequence (rule 2) • We have more than one way to get “sequence” from input “word” • sequence::= • | maybeword • | sequence word • maybeword: := • | word
Reduce-reduce Conflict • Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. • ML-Yacc reduce by first rule • Generally, reduce-reduce conflict is not allowed in your ML-Yacc file • We need to fix our grammar • sequence::= • | sequence word
Summary of conflicts • Shift-reduce conflict • precedence and associativity • Shift by default • Reduce-reduce conflict • reduce by first rule • Not allowed!
Lab3 • Your job is to finish a parser for C language • Input: A “.c” file • Output: “Success!” if the “.c” file is correct • File description • c.lex • c.grm • main.sml • call-main.sml • sources.cm • lab3.mlb • test.c
Using ML-Yacc • Read the ML-Yacc Manual • Run • If your finish “c.grm” and “c.lex” • In command-line: (use MLton’s) • mlyacc c.grm • mllex c.lex • we will get • “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml” • Then compile Lab3 • Start SML/NJ, Run CM.make “sources.cm”; • or in command-line, mlton lab3.mlb • To run lab3 • In SML/NJ, Main.parse “test.c”; • or in command-line, lab3 test.c
“Debug” ML-Yacc File • When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, • mlyacc c.grm • 2 shift/reduce conflicts • open file “c.grm.desc”(This file is generated by mlyacc) • The beginning of this file • the rest are all the states • rule 12 means the 12th rule (from 0) in your ML-Yacc file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog goto 429 structs goto 2 structdec goto 1 . reduce by rule 12
Use ML-lex with ML-yacc • Most of the work in “c.lex” this time can be copied from Lab2 • You can re-use Regular expressions and Lexical rules • Difference with Lab2 • You have to define “token” in “c.grm” • %term INT of int | EOF • “%term” in “c.grm” will be automatically in “c.grm.sig” signature C_TOKENS = sig type ('a,'b) token type svalue val EOF: 'a * 'a -> (svalue,'a) token val INT: (int) * 'a * 'a -> (svalue,'a) token end
Hints • Read ML-Yacc Manual • Read the language specification • Test a lot!