410 likes | 432 Views
Lecture 1 Overview. CSCE 531 Compiler Construction. Topics Instructions Readings:. January 16, 2018. Overview. Today’s Lecture Pragmatics A little History Compilers vs Interpreter Data-Flow View of Compilers Why Study Compilers? Course Pragmatics References Chapter 1
E N D
Lecture 1 Overview CSCE 531 Compiler Construction • Topics • Instructions • Readings: January 16, 2018
Overview • Today’s Lecture • Pragmatics • A little History • Compilers vs Interpreter • Data-Flow View of Compilers • Why Study Compilers? • Course Pragmatics • References • Chapter 1 • Appendix A – sample compiler • Lord J.M. Keyne's Aphorism • "What's not worth doing isn't worth doing well."
Course Pragmatics • Instructor: Manton Matthews • Office: 2233 Storey Innovation Center • Office Hours: 11:30-1:30 T-TH others by appointment • Phone: 777-3285 • Email: “mm” at “sc” in domain “edu” • Class Web Site: http://www.cse.sc.edu/class/csce531-001/ • Class Directory on Linux Machines ~matthews/public/class/csce531
"Compilers: Principles, Techniques and Tools,” Aho, Lam, Sethi and Ullman, Addison-Wesley, 2nd edition, 2007. • C reference – Kernighan and Ritchie • K&R • Many others
Class Directory on Linux Boxes • Class directory /acct/matthews/public/csce531/ • Website - https://cse.sc.edu/~matthews/Courses/212/index.html • Submissions – “Moodle dropbox” https://dropbox.cse.sc.edu/
Evolution of Programming • First electronic computers – programmed by plugboard • Von Neumann – concept of stored program computer • Program stored in memory as data (1946) • Fortran – first compiler, John Backus (1957) (BNF) • Lisp 1958 • Cobol 1959 • Algol68, Pascal • C Ritchie 1972 • http://cm.bell-labs.com/cm/cs/who/dmr/chist.html • C++, Java • Perl, python, ruby scripting languages
Compiler vs Interpreter • Compiler • Interpreter
Where’s Java? • Hybrid • Compilation step – translate source to bytecode • Interpretive step – run bytecode program on Java Virtual Machine (JVM)
Example of Dataflow of Compilation Process • Token Stream • TokenCode Lexeme • ID main • LPAREN ( • RPAREN ) • LBRACE { • INT int”keyword” • ID i • COMMA , • ID sum • SEMICOLON ; • … • Source Program in file • main(){ • inti, sum; • for(i=0, sum=0; i < 100; ++i){ • sum = sum + i; • } • printf("Sum=%d\n", sum); • }
Assembly Code • L2: • cmpl $99, -4(%ebp) • jle L5 • jmp L3 • L5: • movl -4(%ebp), %eax • leal -8(%ebp), %edx • addl %eax, (%edx) • leal -4(%ebp), %eax • incl (%eax) • jmp L2 • L3: • movl -8(%ebp), %eax • movl %eax, 4(%esp) • movl $LC0, (%esp) • call _printf • leave • ret • .def _printf; • more ex1.s • .file "ex1.c" • .def ___main; • .text • LC0: • .ascii "Sum=%d\12\0" • .globl _main • .def _main; .scl • _main: • pushl %ebp • movl %esp, %ebp • subl $24, %esp • andl $-16, %esp • movl $0, %eax • movl %eax, -12(%ebp) • movl -12(%ebp), %eax • call __alloca • call ___main • movl $0, -4(%ebp) • movl $0, -8(%ebp)
Machine Code • Loader phase – link in libraries • #include <stdio.h> • /usr/lib/libc.a - standard library
GCC and some of its flags • GCC – Gnu (Gnu is Not Unix) C compiler • gcc phases • Preprocessor only gcc –E ex1.c > ex1.i • Stop with assembly gcc –S ex1.c • Compile only gcc –c ex1.c • Loader - link in libraries • Disaasemblers • objdump
Dynamic vs Static typing • Static means happens at compile time • Dynamic means happens at run time • Python – • x = parseInput(line) • if x < 0 : • sq = “sqrt of x is Not a real number” • else: • sq = sqrt(x) • print (“sq=“, sq)
Parameter Passing • Call by value • Call by reference • Call by copy-restore
Formal Language Theory • Mathematical Models of languages • where a language is just a set of strings of characters from a particular alphabet • Examples • L1 = {legitimate English words} • L2 = {Keywords of C} • L3 = { strings of zeroes and ones that have more zeroes}
Chomsky Hierarchy • Noam Chomsky – famous Linguist came up with the Chomsky Hierarchy • Level 1 – regular languages • Level 2 – context free languages • Level 3 – context sensitive languages • Level 4 – unrestricted languages • We will look at a few of these in our attempts to define efficient compilers
Regular languages (level 1) • Regular Expressions are expressions that represent regular languages • The operators (the base ones are) • Concatentation . Or juxtaposition • Alternation | • Kleene closure * • For a regular expression R the language that is represents is denoted L(R)
Deterministic Finite Automata • A Deterministic finite automaton (DFA) is a mathematical model that consists of • 1. a set of states S • 2. a set of input symbols ∑, the input alphabet • 3. a transition function δ: S x ∑ Sthat for each state and each input maps to the next state • 4. a state s0that is distinguished as the start state • 5. a set of states F distinguished as accepting (or final) states
A complete front-end: Appendix A • A.1 – the Source Language • A.2 Main.java • A.3 – Lexical Analyzer – Tag.java • A.4 Symbol Tables and Types • A.5 Intermediate Code for Expressions • A.6 jumping Code for Bollean Expressions • A.7 – Intermediate Code for Statements • A.8 – Parser • A.9 – Creating the Front End
Tree – directory of source code • tree -a /acct/matthews/Courses/531/Code • /acct/matthews/Courses/531/Code • ├── dragon-book-source-code-master • │ ├── inter • │ │ ├── Access.java • │ │ ├── And.java • │ │ ├── Arith.java • │ │ ├── ... • │ │ └── While.java • │ ├── lexer • │ │ ├── Lexer.java • │ │ ├── Num.java • │ │ ├── Real.java • │ │ ├── Tag.java • │ │ ├── Token.java • │ │ └── Word.java • │ ├── main • │ │ └── Main.java • │ ├── Makefile • │ ├── parser • │ │ └── Parser.java • │ ├── README • │ ├── symbols • │ │ ├── Array.java • │ │ ├── Env.java • │ │ └── Type.java • │ ├── tests • │ │ ├── block1.i • │ │ ├── ... • │ │ ├── prog3.i • │ │ ├── prog3.t • │ │ ├── prog4.i • │ │ └── prog4.t • │ └── tmp • │ ├── block1.i • │ ├── ... • │ └── prog4.i • └── flexbison
Copying and unpacking • $ mkdir 531 • $ cd 531 • $ cp –fr ~matthews/public/csce531/dragon-book-source-code-master.tgz . • $ tar xvfz dragon-book-source-code-master.tgz • $ tree –a .
Make – Unix build utility • Targets – • Dependency list - • Target tree (forest) • Action list - • Minimal recompilation • Usage: • make // make first target • make targ // make target “targ” • make –f makefileNotNamedMakefile • http://www.eecs.umich.edu/courses/eecs380/HANDOUTS/Makefile-Tutorial.html
Makefile Example • CC=gcc • CFLAGS=-DYYDEBUG • LEX=flex • YACC=bison • YACCFLAGS=-t • simple3: simple3.tab.h simple3.tab.o lex.yy.o • $(CC) $(CFLAGS) simple3.tab.o lex.yy.o -ly -o simple3 • simple3.tab.h: simple3.y • bison -d $(YACCFLAGS) simple3.y • simple3.tab.c: simple3.y • bison $(YACCFLAGS) simple3.y • lex.yy.c: simple3.l • flex simple3.l • clean: • -rm *.o lex.yy.c *.tab.[ch] simple[0-9] *.output *.act
Make: Variables and builtin rules • tree: $(OBJS) • $(CC) $(LDFLAGS) -o $(TREE_DEST) $(OBJS) • %.o: %.c tree.h • $(CC) $(CFLAGS) -c -o $@ $< • make –n // print what would be done but don’t do it • make –p // print rules
Appendix A: Front End Makefile • build: compile test • compile: • javaclexer/*.java • javac symbols/*.java • javac inter/*.java • javac parser/*.java • javac main/*.java • yacc: • /usr/ccs/bin/yacc -v doc/front.y • rmy.tab.c • mv y.output doc • …
Makefile: testing target • test: • @for i in `(cd tests; ls *.t | sed -e 's/.t$$//')`;\ • do echo $$i.t;\ • java main.Main <tests/$$i.t >tmp/$$i.i;\ • diff tests/$$i.itmp/$$i.i;\ • done
clean: • (cd lexer; rm *.class) • (cd symbols; rm *.class) • (cd inter; rm *.class) • (cd parser; rm *.class) • (cd main; rm *.class)
Main.java • package main; • import java.io.*; import lexer.*; import parser.*; • public class Main { • public static void main(String[] args) throws IOException { • Lexerlex = new Lexer(); • Parser parse = new Parser(lex); • parse.program(); • System.out.write('\n'); • } • }