1 / 23

Lab 3: Using ML-Yacc

Lab 3: Using ML-Yacc. Zhong Zhuang dyzz@mail.ustc.edu.cn. How to write a parser?. Write a parser by hand Use a parser generator May not be as efficient as hand-written parser General and robust How it works?. stream of tokens. Parser Specification. Parser. parser generator.

nash
Download Presentation

Lab 3: Using ML-Yacc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail.ustc.edu.cn

  2. How to write a parser? • Write a parser by hand • Use a parser generator • May not be as efficient as hand-written parser • General and robust • How it works? stream of tokens Parser Specification Parser parser generator abstract syntax

  3. ML-Yacc specification • Three parts again User Declarations: declare values available in the rule actions %% ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts %% Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

  4. ML-Yacc Definitions • specify type of positions %pos int * int • specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS ... %nonterm prog | exp | op • specify end-of-parse token %eop EOF • specify start symbol (by default, non terminal in LHS of first rule) %start prog

  5. A Simple ML-Yacc File grammar symbols %% %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF %% exp : fact () | fact PLUS exp () fact : base () | base MUL factor () base : NUM () | LPAR exp RPAR () semantic actions (currently do nothing) grammar rules

  6. each nonterminal may have a semantic value associated with it • when the parser reduces with (X ::= s) • a semantic action will be executed • uses semantic values from symbols in s • when parsing is completed successfully • parser returns semantic value associated with the start symbol • usually a syntax tree

  7. to use semantic values during parsing, we must declare symbol types: • %terminal NUM of int | PLUS | MUL | ... • %nonterminal exp of int | fact of int | base of int • type of semantic action must match type declared for the nonterminal in rule

  8. A Simple ML-Yacc File with Action grammar symbols with type declarations %% %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF %% exp : fact (fact) | fact PLUS exp (fact + exp) fact : base (base) | base MUL base (base1 * base2) base : NUM (NUM) | LPAR exp RPAR (exp) computing integer result via semantic actions grammar rules with semantic actions

  9. Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read

  10. Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is : E+(E*E) exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we shift To be read

  11. Conflicts in ML-Yacc • We often write ambiguous grammar • Example • Tokens from lexer • NUM PLUS NUM MUL NUM • State of Parser • E+E • Result is: (E+E)*E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR If we reduce To be read

  12. This is a shift-reduce conflict • We want E+E*E, because “*” has higher precedence than “+” • Another shift-reduce conflict • Tokens from lexer • NUM PLUS NUM PLUS NUM • State of Parser • E+E • Result is : E+(E+E) and (E+E)+E If we shift To be read If we reduce

  13. Deal with shift-reduce conflicts • This case, we need to reduce, because “+” is left associative • Deal with it! • let ML-Yacc complain. • default choice is to shift when it encounters a shift-reduce error • BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant • rewrite the grammar to eliminate ambiguity • can be complicated and less clear • use Yacc precedence directives • %left, %right %nonassoc

  14. Precedence and Associativity • precedence of terminal based on order in which associativity is specified • precedence of rule is the precedence of the right-most terminal • eg: precedence of (E ::= E + E) == prec(+) • a shift-reduce conflict is resolved as follows • prec(terminal) > prec(rule) ==> shift • prec(terminal) < prec(rule) ==> reduce • prec(terminal) = prec(rule) ==> • assoc(terminal) = left ==> reduce • assoc(terminal) = right ==> shift • assoc(terminal) = nonassoc ==> report as error

  15. datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp %% %left PLUS MINUS %left MUL DIV %% exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp) Higher precedence

  16. Reduce-reduce Conflict • This kind of conflict is more difficult to deal with • Example • When we get a “word” from lexer, • word -> maybeword -> sequence (rule 1) • empty –> sequence word -> sequence (rule 2) • We have more than one way to get “sequence” from input “word” • sequence::= • | maybeword • | sequence word • maybeword: := • | word

  17. Reduce-reduce Conflict • Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. • ML-Yacc reduce by first rule • Generally, reduce-reduce conflict is not allowed in your ML-Yacc file • We need to fix our grammar • sequence::= • | sequence word

  18. Summary of conflicts • Shift-reduce conflict • precedence and associativity • Shift by default • Reduce-reduce conflict • reduce by first rule • Not allowed!

  19. Lab3 • Your job is to finish a parser for C language • Input: A “.c” file • Output: “Success!” if the “.c” file is correct • File description • c.lex • c.grm • main.sml • call-main.sml • sources.cm • lab3.mlb • test.c

  20. Using ML-Yacc • Read the ML-Yacc Manual • Run • If your finish “c.grm” and “c.lex” • In command-line: (use MLton’s) • mlyacc c.grm • mllex c.lex • we will get • “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml” • Then compile Lab3 • Start SML/NJ, Run CM.make “sources.cm”; • or in command-line, mlton lab3.mlb • To run lab3 • In SML/NJ, Main.parse “test.c”; • or in command-line, lab3 test.c

  21. “Debug” ML-Yacc File • When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, • mlyacc c.grm • 2 shift/reduce conflicts • open file “c.grm.desc”(This file is generated by mlyacc) • The beginning of this file • the rest are all the states • rule 12 means the 12th rule (from 0) in your ML-Yacc file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog goto 429 structs goto 2 structdec goto 1 . reduce by rule 12

  22. Use ML-lex with ML-yacc • Most of the work in “c.lex” this time can be copied from Lab2 • You can re-use Regular expressions and Lexical rules • Difference with Lab2 • You have to define “token” in “c.grm” • %term INT of int | EOF • “%term” in “c.grm” will be automatically in “c.grm.sig” signature C_TOKENS = sig type ('a,'b) token type svalue val EOF: 'a * 'a -> (svalue,'a) token val INT: (int) * 'a * 'a -> (svalue,'a) token end

  23. Hints • Read ML-Yacc Manual • Read the language specification • Test a lot!

More Related