Compiler Automation Tools

Compiler Automation Tools

Why Automation? • N programming languages • M machines • We need N*M compilers • 1970’s • automatic lexical analyzer (scanner) generator • automatic syntax analyzer (parser) generator • Code generation module generator • developed some generators very recently • Formalism for code generation is difficult.

Compiler Generator • Sometimes called “Compiler-Compiler” language description Compiler-Compiler machine description Compiler object program source program

Scanner Generator regular expression Scanner Generator action code Scanner Series of Tokens source program

Implementation of Scanner Generator • Recognition of Regular Expression • RE  NFA (Non-deterministic Finite Automata) • NFA  DFA (Deterministic Finite Automata) • DFA optimization (by reducing the states) • Scanner program generation

Lex • Scanner Generator by M. E. Lesk, 1975 Lex Lex input (by user) lex.yy.c Lex library C Compiler Scanner Series of Tokens source program

Lex Input Structure <definitions> %% <rules> %% <user sub-programs>

Definition Part %{ /* This part is merely copied onto the generated program */ /* data structures, variables, constants used in action codes */ %} name1 substitution1 name2 substitution2 … %% rules… %% user subprogram…

Definition Example L [a-zA-Z] D [0-9] %% {L}({L}|{D})* return xxx; Definitions Rules [a-zA-Z]([a-zA-Z]|[0-9])*

Rules %% R1 A1 R2 A2 … R: regular expressions A: actions

Rule Examples • int printf(“found keyword int\n”); • If the string “int” is found and matched in the input stream, output the message “found keyword int”. • [0-9]+ {nc++; printf(“found a integer constant\n”); } • if any numeric string is matched in the input string, increment nc, and output the string “found a integer constant” Action Regular Expression Regular Expression Action

User Subprograms • Just copied into lex.yy.c without any processing by Lex.

Lex Regular Expressions - 1 • Lex RE := text characters + operator characters • Operators • Escaping • a”*”b  a*b // i.e., * is not operator • but a*b  {b, ab, aab, aaab, …} • a\+b  a+b // escape only for a single character • [ ] : used for “type of the character” • - : range operator • [a-z]: Any single character in the set {a, b, …, z} • [-+0-9]: Minus or plus sign followed by a single digit from 0 to 9. • [A-Za-z0-9]: Any single character in the set {A, B, …, Z, a, b, …z, 0, 1, …, 9} • ^: complement • [^*]: Any single character except * • [^a-zA-Z]: Any single character except Roman alphabet.

Lex Regular Expressions - 2 • Operators • \ : Escape character (for C language) • [ \t\n] : One of space, tab, or newline character • [\40-\176]: One of character from ascii 40 (blank) to ascii 176 (~) • *: zero or more repetition • a*: one of blank, a, aa, aaa, … • [a-z][A-Z]* : e.g.) aA, a, bBBF, … • +: one or more repetition • a+: one of a, aa, aaa, … • | : or • ^: Matched when the string appears at the beginning of a line • ex) ^abc: mached when the string “abc” appears at the beginning of a line • $: end of line

Lex Regular Expressions - 3 • Operators • /: trailing context • ex) ab/cd : “ab” is accepted only when “cd” follows “ab” • .: one of all characters except newline • ?: Selection • ex) ab?c  “abc” or “ac”

Lex Regular Expressions - 4 • Exercise: Represent the following tokens with Lex regular expression • Identifiers in C language • answer) [a-zA-Z_][a-zA-Z0-9_]* • Real numbers in C language • answer) [0-9]+”.”[0-9]+(e[+-]?[0-9]+)? • String constants in C language • answer) \”([^\042\134]|”\”.))*\” cf)\042 = “, \134 = \ • Comments in C language • answer) “/*”([^*]|”*”+[^*/])*”*”+”/”

Lex Action Code - 1 • Null statement ex) [ \t\n] ; // for blank, tab, and newline, do nothing • yytext • global variable in Lex • keeping the matched token itself • ex) print out the matched string • [a-z]+ print(“%s”, yytest); • [a-z]+ ECHO

Lex Action Code - 2 • Global variables in Lex • yytext • yyleng: the length of the matched string • Global functions in Lex • yymore() : append the next matched string at the tail of the current yytext • yyless(n): leave n length string in yytext. return back the remainings to the input stream for more processing • yywrap(): call automatically by Lex when it meets the end of the input stream. It returns 1 with the normal case. • yylex(): takes one character from the input stream. It tries to match the current read token with the action code rules. If any matched rule is found, it return the return value in terms of the action code rule.

Lex Action Code - 3 • I/O Functions in Lex • input(): get the next character from the input stream. • output(c): output the character c into the output stream. • unput(c): return back already read character c into input stream. The character will be read again by input().

Example • Convert “int” into “integer”, “{“ into “begin”, and “}” into “end” in the input stream %{ #include <stdio.h> %} %% int printf(“integer”); “{“ printf(“begin”); “}” printf(“end”);

Which Rules? • When Lex finds two or more rules for matching the current token • rule 1: Lex takes the rule to recognize more lengthy token. • rule 2: If the rules recognize the same length token, Lex takes the firstly defined rule. ex) integer printf(“Keyword integer\n”); [a-z]+ printf(“Identifier => %s\n”, yytext); when input stream is “… integers … “, Lex takes the second one when input stream is “… integer …”, Lex takes the first one

Lex Scanner Example (1) • Lex File: “test.l” %{ #include <stdio.h> #include <stdlib.h> enum tnumber {TEOF, TIDEN, TNUM, TASSIGN, TADD, TSEMI, TDOT, TBEGIN, TEND, TERROR}; %} letter [a-zA-Z] digit [0-9] %% begin return(TBEGIN); /* return value of yylex() */ end return(TEND); {letter}({letter}|{digit})* return(TIDENT); …

Lex Scanner Example (2) • Lex File: “test.l” (continued) %% void main() { enum tnumber tn; /*token number */ printf(“Start of Lex\n”); while ((tn = yylex()) != TEOF) { switch (tn) { case TBEGIN: printf(“Begin\n”); break; case TEND: printf(“End\n”); break; … } }

Lex Scanner Example (3) • Lex File: “test.l” (continued) int yywrap() { printf(“ End of Lex\n”); return 1; }

Lex Scanner Example (4) • Data File: “test.dat” begin num := 0; num := num + 526; end. • Making the scanner lex test.l  generates lex.yy.c generate scanner.exe by linking lex.yy.c with lex library scanner < test.dat  output generated

Lex Scanner Example (5) • Scan Result Start of Lex Begin Identifier: num Assignment_op Number: 0 Semicolon Identifier: num … End of Lex

Parser Generator • PGS (Parser Generating System) • PGS input • BNF or EBNF context-free grammar PGS BNF/EBNF Grammar Parsing Table Parser Program Structure Token Stream

YACC • YACC (Yet Another Compiler Compiler) • Stephen C. Johnson, Bell-Lab, 1975 • LALR(1) Parser generator YACC YACC spec (*.y) y.tab.c Yacc library C Compiler Parser Parser Output Token Stream

YACC Specification File <definition part> %% <production rules> %% <user program part>

Production Rules • A: BODY; • Example • BNF <expression>:==<expression>+<term> • YACC expression : expression ‘+’ term; • Example exp : exp ‘+’ term | exp ‘-’ term ; A : ; /* rule for A */

Token Names • Tokens (terminals) should be predefined in <definition part> (They are also passed from scanner.) %token name1 anem2 … • Example %token TVAR %% var_dcl : TVAR var_def ‘;’;

Start Symbol • Start symbol can be explicitly defined in <definition part>: %start symbol_name • If no start symbol is explicitly defined, the lhs of the first production rule will be the start symbol.

Semantic Action • Semantic action will be activated when the corresponding production rule is accepted by the parser. • Example exp: exp ‘+’ term {printf(“addition exp detected\n”); }; exp: term {printf(“simple exp detected\n”); };

Pseudo Variables • $1: the first symbol in rhs • $2: the second symbol in rhs • … • $$: the symbol in lhs • Example factor: ‘-’ factor {$$ = -$2;} | ‘(‘ exp ‘)’ {$$ = $2}; | NUMBER {$$ = $1} ;

Intermediate Action Codes X: Y { f();} Z; …

Ambiguous Grammar • YACC treats the ambiguous grammar using “right-precedence” rule term : term ‘*’ term; • term*term*term = term * (term * term)

Implementations of Lex and Yacc • AT&T Lex & Yacc • UNIX • AT&T, 1975 • Berkeley Lex & Yacc • UNIX (BSD version) • GNU Bison & Flex • Bison: GNU Yacc • Flex: GNU Lex • Free soruce code • …

Compiler Automation Tools

Compiler Automation Tools

Presentation Transcript

Phoenix Compiler And Tools Infrastructure Update

The Phoenix Compiler and Tools Framework

Automation Tools for UCS Sysadmins

Automation of machine tools programming

UI Automation Tools

Compiler Tools

Compiler and Tools: User Requirements from ARSC

ILLINOIS AUTOMATION TOOLS

Top Twitter Automation tools

Easy Marketing Automation, Best Sales CRM Automation Tools

Automation tools training in Chennai

Form Automation Tools Essential Features

Billingparadise RCM Automation Tools

What is Automation and Tools | Sage Automation Thane

Automation Testing Importance, Benefits | Automation Testing Tools

Top Instagram Automation Tools!!

Pinterest automation tools

Social Media Automation Tools

Guide for Twitter Automation Tools

Compilers and Compiler-based Tools for HPC

Tools Related to Compiler Backends

SMMDark - About Instagram Automation Tools