1 / 37

Compiler Automation Tools

Compiler Automation Tools. Why Automation?. N programming languages M machines We need N*M compilers 1970’s automatic lexical analyzer (scanner) generator automatic syntax analyzer (parser) generator Code generation module generator developed some generators very recently

liluye
Download Presentation

Compiler Automation Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Automation Tools

  2. Why Automation? • N programming languages • M machines • We need N*M compilers • 1970’s • automatic lexical analyzer (scanner) generator • automatic syntax analyzer (parser) generator • Code generation module generator • developed some generators very recently • Formalism for code generation is difficult.

  3. Compiler Generator • Sometimes called “Compiler-Compiler” language description Compiler-Compiler machine description Compiler object program source program

  4. Scanner Generator regular expression Scanner Generator action code Scanner Series of Tokens source program

  5. Implementation of Scanner Generator • Recognition of Regular Expression • RE  NFA (Non-deterministic Finite Automata) • NFA  DFA (Deterministic Finite Automata) • DFA optimization (by reducing the states) • Scanner program generation

  6. Lex • Scanner Generator by M. E. Lesk, 1975 Lex Lex input (by user) lex.yy.c Lex library C Compiler Scanner Series of Tokens source program

  7. Lex Input Structure <definitions> %% <rules> %% <user sub-programs>

  8. Definition Part %{ /* This part is merely copied onto the generated program */ /* data structures, variables, constants used in action codes */ %} name1 substitution1 name2 substitution2 … %% rules… %% user subprogram…

  9. Definition Example L [a-zA-Z] D [0-9] %% {L}({L}|{D})* return xxx; Definitions Rules [a-zA-Z]([a-zA-Z]|[0-9])*

  10. Rules %% R1 A1 R2 A2 … R: regular expressions A: actions

  11. Rule Examples • int printf(“found keyword int\n”); • If the string “int” is found and matched in the input stream, output the message “found keyword int”. • [0-9]+ {nc++; printf(“found a integer constant\n”); } • if any numeric string is matched in the input string, increment nc, and output the string “found a integer constant” Action Regular Expression Regular Expression Action

  12. User Subprograms • Just copied into lex.yy.c without any processing by Lex.

  13. Lex Regular Expressions - 1 • Lex RE := text characters + operator characters • Operators • Escaping • a”*”b  a*b // i.e., * is not operator • but a*b  {b, ab, aab, aaab, …} • a\+b  a+b // escape only for a single character • [ ] : used for “type of the character” • - : range operator • [a-z]: Any single character in the set {a, b, …, z} • [-+0-9]: Minus or plus sign followed by a single digit from 0 to 9. • [A-Za-z0-9]: Any single character in the set {A, B, …, Z, a, b, …z, 0, 1, …, 9} • ^: complement • [^*]: Any single character except * • [^a-zA-Z]: Any single character except Roman alphabet.

  14. Lex Regular Expressions - 2 • Operators • \ : Escape character (for C language) • [ \t\n] : One of space, tab, or newline character • [\40-\176]: One of character from ascii 40 (blank) to ascii 176 (~) • *: zero or more repetition • a*: one of blank, a, aa, aaa, … • [a-z][A-Z]* : e.g.) aA, a, bBBF, … • +: one or more repetition • a+: one of a, aa, aaa, … • | : or • ^: Matched when the string appears at the beginning of a line • ex) ^abc: mached when the string “abc” appears at the beginning of a line • $: end of line

  15. Lex Regular Expressions - 3 • Operators • /: trailing context • ex) ab/cd : “ab” is accepted only when “cd” follows “ab” • .: one of all characters except newline • ?: Selection • ex) ab?c  “abc” or “ac”

  16. Lex Regular Expressions - 4 • Exercise: Represent the following tokens with Lex regular expression • Identifiers in C language • answer) [a-zA-Z_][a-zA-Z0-9_]* • Real numbers in C language • answer) [0-9]+”.”[0-9]+(e[+-]?[0-9]+)? • String constants in C language • answer) \”([^\042\134]|”\”.))*\” cf)\042 = “, \134 = \ • Comments in C language • answer) “/*”([^*]|”*”+[^*/])*”*”+”/”

  17. Lex Action Code - 1 • Null statement ex) [ \t\n] ; // for blank, tab, and newline, do nothing • yytext • global variable in Lex • keeping the matched token itself • ex) print out the matched string • [a-z]+ print(“%s”, yytest); • [a-z]+ ECHO

  18. Lex Action Code - 2 • Global variables in Lex • yytext • yyleng: the length of the matched string • Global functions in Lex • yymore() : append the next matched string at the tail of the current yytext • yyless(n): leave n length string in yytext. return back the remainings to the input stream for more processing • yywrap(): call automatically by Lex when it meets the end of the input stream. It returns 1 with the normal case. • yylex(): takes one character from the input stream. It tries to match the current read token with the action code rules. If any matched rule is found, it return the return value in terms of the action code rule.

  19. Lex Action Code - 3 • I/O Functions in Lex • input(): get the next character from the input stream. • output(c): output the character c into the output stream. • unput(c): return back already read character c into input stream. The character will be read again by input().

  20. Example • Convert “int” into “integer”, “{“ into “begin”, and “}” into “end” in the input stream %{ #include <stdio.h> %} %% int printf(“integer”); “{“ printf(“begin”); “}” printf(“end”);

  21. Which Rules? • When Lex finds two or more rules for matching the current token • rule 1: Lex takes the rule to recognize more lengthy token. • rule 2: If the rules recognize the same length token, Lex takes the firstly defined rule. ex) integer printf(“Keyword integer\n”); [a-z]+ printf(“Identifier => %s\n”, yytext); when input stream is “… integers … “, Lex takes the second one when input stream is “… integer …”, Lex takes the first one

  22. Lex Scanner Example (1) • Lex File: “test.l” %{ #include <stdio.h> #include <stdlib.h> enum tnumber {TEOF, TIDEN, TNUM, TASSIGN, TADD, TSEMI, TDOT, TBEGIN, TEND, TERROR}; %} letter [a-zA-Z] digit [0-9] %% begin return(TBEGIN); /* return value of yylex() */ end return(TEND); {letter}({letter}|{digit})* return(TIDENT); …

  23. Lex Scanner Example (2) • Lex File: “test.l” (continued) %% void main() { enum tnumber tn; /*token number */ printf(“Start of Lex\n”); while ((tn = yylex()) != TEOF) { switch (tn) { case TBEGIN: printf(“Begin\n”); break; case TEND: printf(“End\n”); break; … } }

  24. Lex Scanner Example (3) • Lex File: “test.l” (continued) int yywrap() { printf(“ End of Lex\n”); return 1; }

  25. Lex Scanner Example (4) • Data File: “test.dat” begin num := 0; num := num + 526; end. • Making the scanner lex test.l  generates lex.yy.c generate scanner.exe by linking lex.yy.c with lex library scanner < test.dat  output generated

  26. Lex Scanner Example (5) • Scan Result Start of Lex Begin Identifier: num Assignment_op Number: 0 Semicolon Identifier: num … End of Lex

  27. Parser Generator • PGS (Parser Generating System) • PGS input • BNF or EBNF context-free grammar PGS BNF/EBNF Grammar Parsing Table Parser Program Structure Token Stream

  28. YACC • YACC (Yet Another Compiler Compiler) • Stephen C. Johnson, Bell-Lab, 1975 • LALR(1) Parser generator YACC YACC spec (*.y) y.tab.c Yacc library C Compiler Parser Parser Output Token Stream

  29. YACC Specification File <definition part> %% <production rules> %% <user program part>

  30. Production Rules • A: BODY; • Example • BNF <expression>:==<expression>+<term> • YACC expression : expression ‘+’ term; • Example exp : exp ‘+’ term | exp ‘-’ term ; A : ; /* rule for A */

  31. Token Names • Tokens (terminals) should be predefined in <definition part> (They are also passed from scanner.) %token name1 anem2 … • Example %token TVAR %% var_dcl : TVAR var_def ‘;’;

  32. Start Symbol • Start symbol can be explicitly defined in <definition part>: %start symbol_name • If no start symbol is explicitly defined, the lhs of the first production rule will be the start symbol.

  33. Semantic Action • Semantic action will be activated when the corresponding production rule is accepted by the parser. • Example exp: exp ‘+’ term {printf(“addition exp detected\n”); }; exp: term {printf(“simple exp detected\n”); };

  34. Pseudo Variables • $1: the first symbol in rhs • $2: the second symbol in rhs • … • $$: the symbol in lhs • Example factor: ‘-’ factor {$$ = -$2;} | ‘(‘ exp ‘)’ {$$ = $2}; | NUMBER {$$ = $1} ;

  35. Intermediate Action Codes X: Y { f();} Z; …

  36. Ambiguous Grammar • YACC treats the ambiguous grammar using “right-precedence” rule term : term ‘*’ term; • term*term*term = term * (term * term)

  37. Implementations of Lex and Yacc • AT&T Lex & Yacc • UNIX • AT&T, 1975 • Berkeley Lex & Yacc • UNIX (BSD version) • GNU Bison & Flex • Bison: GNU Yacc • Flex: GNU Lex • Free soruce code • …

More Related