Lab 3: Using ML-Yacc

Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail.ustc.edu.cn

How to write a parser? • Write a parser by hand • Use a parser generator • May not be as efficient as hand-written parser • General and robust • How it works? stream of tokens Parser Specification Parser parser generator abstract syntax

ML-Yacc specification • Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

each nonterminal may have a semantic value associated with it • when the parser reduces with (X ::= s) • a semantic action will be executed • uses semantic values from symbols in s • when parsing is completed successfully • parser returns semantic value associated with the start symbol • usually a syntax tree

Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read

Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is : E+(E*E) exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we shift To be read

Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is: (E+E)*E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we reduce To be read

This is a shift-reduce conflict • We want E+E*E, because “*” has higher precedence than “+” • Another shift-reduce conflict • Tokens from lexer • NUM PLUS NUM PLUS NUM • State of Parser • E+E • Result is : E+(E+E) and (E+E)+E If we shift To be read If we reduce

Deal with shift-reduce conflicts • This case, we need to reduce, because “+” is left associative • Deal with it! • let ML-Yacc complain. • default choice is to shift when it encounters a shift-reduce error • BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant • rewrite the grammar to eliminate ambiguity • can be complicated and less clear • use Yacc precedence directives • %left, %right %nonassoc

Precedence and Associativity • precedence of terminal based on order in which associativity is specified • precedence of rule is the precedence of the right-most terminal • eg: precedence of (E ::= E + E) == prec(+) • a shift-reduce conflict is resolved as follows • prec(terminal) > prec(rule) ==> shift • prec(terminal) < prec(rule) ==> reduce • prec(terminal) = prec(rule) ==> • assoc(terminal) = left ==> reduce • assoc(terminal) = right ==> shift • assoc(terminal) = nonassoc ==> report as error

Reduce-reduce Conflict • This kind of conflict is more difficult to deal with • Example • When we get a “word” from lexer, • word -> maybeword -> sequence (rule 1) • empty –> sequence word -> sequence (rule 2) • We have more than one way to get “sequence” from input “word” • sequence::= • | maybeword • | sequence word • maybeword: := • | word

Reduce-reduce Conflict • Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. • ML-Yacc reduce by first rule • Generally, reduce-reduce conflict is not allowed in your ML-Yacc file • We need to fix our grammar • sequence::= • | sequence word

Summary of conflicts • Shift-reduce conflict • precedence and associativity • Shift by default • Reduce-reduce conflict • reduce by first rule • Not allowed!

Lab3 • Your job is to finish a parser for C language • Input: A “.c” file • Output: “Success!” if the “.c” file is correct • File description • c.lex • c.grm • main.sml • call-main.sml • sources.cm • lab3.mlb • test.c

Using ML-Yacc • Read the ML-Yacc Manual • Run • If your finish “c.grm” and “c.lex” • In command-line: (use MLton’s) • mlyacc c.grm • mllex c.lex • we will get • “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml” • Then compile Lab3 • Start SML/NJ, Run CM.make “sources.cm”; • or in command-line, mlton lab3.mlb • To run lab3 • In SML/NJ, Main.parse “test.c”; • or in command-line, lab3 test.c

“Debug” ML-Yacc File • When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, • mlyacc c.grm • 2 shift/reduce conflicts • open file “c.grm.desc”(This file is generated by mlyacc) • The beginning of this file • the rest are all the states • rule 12 means the 12th rule (from 0) in your ML-Yacc file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog goto 429 structs goto 2 structdec goto 1 . reduce by rule 12

Use ML-lex with ML-yacc • Most of the work in “c.lex” this time can be copied from Lab2 • You can re-use Regular expressions and Lexical rules • Difference with Lab2 • You have to define “token” in “c.grm” • %term INT of int | EOF • “%term” in “c.grm” will be automatically in “c.grm.sig” signature C_TOKENS = sig type ('a,'b) token type svalue val EOF: 'a * 'a -> (svalue,'a) token val INT: (int) * 'a * 'a -> (svalue,'a) token end

Hints • Read ML-Yacc Manual • Read the language specification • Test a lot!

Lab 3: Using ML-Yacc