290 likes | 301 Views
This course explores the significance of compilers in computer science, bridging architecture and programming, with tools for language interpreters beyond programming languages. Learn about Chomsky Hierarchy, Formal Languages, Automata Theory, and practical compiler applications.
E N D
Introduction CPSC 388 Ellen Walker Hiram College
Why Learn About Compilers? • Practical application of important computer science theory • Ties together computer architecture and programming • Useful tools for developing language interpreters • Not just programming languages!
Computer Languages • Machine language • Binary numbers stored in memory • Bits correspond directly to machine actions • Assembly language • A “symbolic face” for machine language • Line-for-line translation • High-level language (our goal!) • Closer to human expressions of problems, e.g. mathematical notation
Assembler vs. HLL • Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X • HLL X = 2;
Characteristics of HLL’s • Easier to learn (and remember) • Machine independent • No knowledge of architecture needed • … as long as there is a compiler for that machine!
Early Milestones • FORTRAN (Formula Translation) • IBM (John Backus) 1954-1957 • First High-level language, and first compiler • Chomsky Hierarchy (1950’s) • Formal description of natural language structure • Ranks languages according to the complexity of their grammar
Chomsky Hierarchy • Type 3: Regular languages • Too simple for programming languages • Good for tokens, e.g. numbers • Type 2: Context Free languages • Standard representation of programming languages • Type 1: Context Sensitive Languages • Type 0: Unrestricted
Another View of the Hierarchy CSL CFL RL
Formal Language & Automata Theory • Machines to recognizes each language class • Turing Machine (computable languages) • Push-down Automaton (context-free languages) • Finite Automaton (regular languages) • Use machines to prove that a given language belongs to a class • Formally prove that a given language does not belong to a class
Practical Applications of Theory • Translate from grammar to formal machine description • Implement the formal machine to parse the language • Tools: • Scanner Generator (RL / FA): LEX, FLEX • Parser Generator (CFL / FA): YACC, Bison
Beyond Parsing • Code generation • Optimization • Techniques to “mindlessly” improve code • Usually after code generation • Rarely “optimal”, simply better
Phases of a Compiler • Scanner -> tokens • Parser -> syntax tree • Semantic Analyzer -> annotated tree • Source code optimizer -> intermediate code • Code generator -> target code • Target code optimizer -> better target code
Additional Tables • Symbol table • Tracks all variable names and other symbols that will have to be mapped to addresses later • Literal table • Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program
Scanner • Read a stream of characters • Perform lexical analysis to generate tokens • Update symbol and literal tables as needed • Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM
Parser • Performs syntax analysis • Relates the sequence of tokens to the grammar • Builds a tree that represents this relationship, the parse tree
Partial Grammar • assign-expr -> expr = expr • array-expr -> ID [ expr ] • expr -> array-expr • expr -> expr + expr • expr -> ID • expr -> NUM
Example Parse assign-expression expression = expression array-expression add-expression ID [ expression ] expression + expression ID NUM NUM
Abstract Syntax Tree assign-expression expression expression array-expression add-expression ID expression expression expression ID NUM NUM
Semantic Analyzer • Determine the meaning (not structure) of the program • This is “compile-time” or static semantics only • Example; a[j] = 4 + 1 • a refers to an array location • a contains integers • j is an integer • j is in the range of the array (not checked in C) • Parse or Syntax tree is “decorated” with this information
Source Code Optimizer • Simplify and improve the source code by applying rules • Constant folding: replace “4+2” by 6 • Combine common sub-expressions • Reordering expressions (often prior to constant folding) • Etc. • Result: modified, decorated syntax tree or Intermediate Representation
Code Generator • Generates code for the target machine • Example: • MOV R0, j value of j into R0 • MUL R0, 2 2*j in R0 (int = 2 wds) • MOV R1, &a value of a in R1 • ADD R1, R0 a+2*j in R1 (addr of a[j]) • MOV *R1, 6 6 into address in R1
Target Code Optimizer • Apply rules to improve machine code • Example: • MOV R0, j • SHL R0 (shift to multiply by 2) Use more complex • MOV &a[R0], 6 machine instruction to replace simpler ones
Major Data Structures • Tokens • Syntax Tree • Symbol Table • Literal Table • Intermediate Code • Temporary files
Structuring a Compiler • Analysis vs. Synthesis • Analysis = understanding the source code • Synthesis = generating the target code • Front end vs. Back end • Front end: parsing & intermediate code generation (target machine-independent) • Back end: target code generation • Optimization included in both parts
Multiple Passes • Each pass process the source code once • One pass per phase • One pass for several phases • One pass for entire compilation • Language definition can preclude one-pass compilation
Runtime Environments • Static (e.g. FORTRAN) • No pointers, no dynamic allocation, no recursion • All memory allocation done prior to execution • Stack-based (e.g. C family) • Stack for nested allocation (call/return) • Heap for random allocation (new) • Fully dynamic (LISP) • Allocation is automatic (not in source code) • Garbage collection required
Error Handling • Each phase finds and handles its own types of errors • Scanning: errors like: 1o1 (invalid ID) • Parsing: syntax errors • Semantic Analysis: type errors • Runtime errors handled by the runtime environment • Exception handling by programmer often allowed
Compiling the Compiler • Using machine language • Immediately executable, hard to write • Necessary for the first (FORTRAN) compiler • Using a language with an existing compiler and the same target machine • Using the language to be compiled (bootstrapping)
Bootstrapping • Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) • Write a complete compiler in the language subset • Compile the complete compiler using the “quick & dirty” compiler