120 likes | 331 Views
Syntax error handling Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime: reference a NULL pointer Goals of error-handling in a parser To detect and report the presence of errors
E N D
Syntax error handling • Errors can occur at many levels • lexical: unknown operator • syntactic: unbalanced parentheses • semantic: variable never declared • runtime: reference a NULL pointer • Goals of error-handling in a parser • To detect and report the presence of errors • To recover from an error and detect subsequent errors • To not slow down the processing of correct programs
Error recovery strategies • Panic mode recovery • On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing token is found. • Phrase-level recovery • On discovering an error, perform a local fix to allow the parser to continue.
Error recovery in predictive parsing • Recovery in a non-recursive predictive parser is easier than in a recursive descent parser. • Panic mode recovery • If a terminal on stack, pop the terminal. • If a non-terminal on stack, shift the input until the terminal can expand. • Phrase-level recovery • Carefully filling in the blank entries about what to do.
Error recover in LR parsing • Canonical LR parsers never make extra reductions when recognizing an error. • SLR and LALR may make extra reductions, but will never shift an erroneous input symbol on the stack. • Panic mode recovery • Scan down stack until a state representing a major program construct is found. Input symbols are discarded until one is found that is in the follow of the nonterminal. Trying to isolate the phrase containing the error. • Phrase level recovery • Implement an error recovery routine for each error entry in the table.
Writing a parser with YACC (Yet Another Compiler Compiler). • Generates LALR parsers • Work with lex. YACC calls yylex to get next token. • YACC and lex must agree on the values for each token. • Produce y.tab.c file by “yacc yaccfile”, which contains a routine yyparse(). • yyparse() returns 0 if the program is ok, non-zero otherwise • YACC file format: declarations %% translation rules %% supporting C-routines
The declarations part specifies tokens, non-terminals symbols, other C constructs. • To specify token AAA BBB • %token AAA BBB • To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token • %token EOFnumber 0 • %token SEMInumber 101 • Non-terminals do not need to be declared unless you want to associated it with a type (will be discussed later).
Translations rules specify the grammar productions exp : exp PLUSnumber exp | exp MINUSnumber exp | exp TIMESnumber exp | exp DIVIDEnumber exp | LPARENnumber exp RPARENnumber | ICONSTnumber ; exp : exp PLUSnumber exp ; exp : exp MINUSnumber exp ;
Yacc environment • Yacc processes the specification file and produce a y.tab.c file. • An integer function yyparse() is produced by Yacc. • Calls yylex() to get tokens. • Return non-zero when an error is found. • Return 0 if the program is accepted. • Need main() and and yyerror() functions. • Example: yyerror(str) char *str; { printf("yyerror: %s at line %d\n", str, yyline); } main() { if (!yyparse()) {printf("accept\n");} else printf("reject\n"); }
YACC builds a LALR parser for the grammar. • May have shift/reduce and reduce/reduce conflicts if there are problems with the grammar. • Default conflict resolution: • shift/reduce --> shift • reduce/reduce --> first production in the state • should always avoid reduce/reduce conflicts • ‘yacc -v *.y’ will generate a report in file ‘y.output’. • See example1.y • The programmer MUST resolve all conflicts (unless you really know what you are doing). • modify the grammar. See example2.y • Use precedence and associativity of operators.
Use precedence and associativity of operators. • Using keywords %left, %right, %nonassoc in the declarations section. • All tokens on the same line are the same precedence level and associativity. • The lines are listed in order of increasing precedence. %left PLUSnumber, MINUSnumber %left TIMESnumber, DIVIDEnumber • See example3.y
Symbol attributes • Each symbol can be associated with some attributes. • Data structure of the attributes can be specified in the union in the declarations. (see example4.y). %union { int semantic_value; } %token <semantic_value> ICONSTnumber 119 %type <semantic_value> exp %type <semantic_value> term %type <semantic_value> item • Semantic actions associate with productions can be specified
Semantic actions • Semantic actions associate with productions can be specified. item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ; • $$ is the attribute associated with the left handside of the production • $1 is the attribute associated with the first symbol in the right handside, $2 for the second symbol, … • An action can be in anyway in the production, it is also counted as a symbol. • Checkout example5.y for examples with multiple types associated with different symbol.