130 likes | 149 Views
Using Yacc. Introduction. Grammar CFG Recursive Rules Shift/Reduce Parsing See Figure 3-2. LALR(1) What Yacc Cannot Parse It cannot deal with ambiguous grammars If you give it one that it cannot handle it will tell you, so there is no problem of overcomplex parsers silently failing.
E N D
Introduction • Grammar • CFG • Recursive Rules • Shift/Reduce Parsing • See Figure 3-2. • LALR(1) • What Yacc Cannot Parse • It cannot deal with ambiguous grammars • If you give it one that it cannot handle it will tell you, so there is no problem of overcomplex parsers silently failing.
The Structure of a Yacc grammar (Definition section) %% (Rules section) %% (User subroutines section)
The Definition Section • The definition section includes declarations of the tokens used in the grammar, the types of values used on the parser stack, and other odds and ends. • You don’t have to specify the number of the token. • It can also include a literal block, C code enclosed in %{ %}
The Rules Section • Since ASCII keyboards don’t have a key, we use a colon between the left- and right-hand sides of a rule, and we put a semicolon at the end of each rule • The symbol on the left-hand side of the first rule in the grammar is normally the start symbol, though you can use a %start declaration in the definition section to override that.
Symbol Values and Actions • Every symbol in a yacc parser has a value • The semantic record • A number, a literal text string, …. • Nonterminal symbols can have any values you want, created by code in the parser • In real parsers, the values of different symbols use different data types • int, double, char *, …. • If you have multiple value types, you have to list all the value types used in a parser so that yacc can create a C union typedef called YYSTYPE to contain them • By default, yacc makes all values of type int
Symbol Values and Actions • $$: • The value of the LHS symbol • The semantic routine should give value to it. • $i: • The value of the i-th symbol in the RHS of the production • Terminal symbol: The value was given by the lex. • Nonterminal symbol: The value was given previously by an execution of some semantic routine.
The Lexer • The parser is the higher level routine, and calls the lexer yylex() • Yacc defines the token names in the parser as C preprocessor names in y.tab.h • See ch3-01.l • Whenever the lexer returns a token to the parser, if the token has an associated value, the lexer must store the value in yylval before returning • In the first example, we explicitly declare yylval. • In more complex parsers, yacc defines yylval as a union and puts the definition in y.tab.h
Compiling and Running a Simple Parser • See P. 59. • Note that you cannot exchange the order of the executions of yacc and lex.
Arithmetic Expressions and Ambiguity • You may input an ambiguity grammar to test Yacc • There are 16 shift/reduce conflicts in the program of P.60 • There are two ways to specify precedence and associativity in a grammar implicitly and explicitly • To specify them implicitly, • Rewrite the grammar using separate non-terminal symbols for each precedence level • See P.62 • To specify them explicitly • Add some rule to the definition section %left ‘+’ ‘-’ %left ‘*’ ‘/’ %nonassoc UMINUS
Exercise • Using the expression rules shown in P.62 of “lex and yacc” to write a yacc program. • Hint: ch3-01.y and ch3-01.l • Please list your source code and execution results.
When Not to Use Precedence Rules • You can use precedence rules to fix any shift/reduce conflict that occurs in the grammar • We recommend that you use precedence in only two situations • In expression grammars • To resolve the “dangling else” conflict in grammars for if-then-else language constructs
Variables and Typed Tokens • See Example 3-2, P.64 • Symbol Values and %union