Understanding Compilers: Theory to Practical Application

Introduction CPSC 388 Ellen Walker Hiram College

Why Learn About Compilers? • Practical application of important computer science theory • Ties together computer architecture and programming • Useful tools for developing language interpreters • Not just programming languages!

Computer Languages • Machine language • Binary numbers stored in memory • Bits correspond directly to machine actions • Assembly language • A “symbolic face” for machine language • Line-for-line translation • High-level language (our goal!) • Closer to human expressions of problems, e.g. mathematical notation

Assembler vs. HLL • Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X • HLL X = 2;

Characteristics of HLL’s • Easier to learn (and remember) • Machine independent • No knowledge of architecture needed • … as long as there is a compiler for that machine!

Early Milestones • FORTRAN (Formula Translation) • IBM (John Backus) 1954-1957 • First High-level language, and first compiler • Chomsky Hierarchy (1950’s) • Formal description of natural language structure • Ranks languages according to the complexity of their grammar

Chomsky Hierarchy • Type 3: Regular languages • Too simple for programming languages • Good for tokens, e.g. numbers • Type 2: Context Free languages • Standard representation of programming languages • Type 1: Context Sensitive Languages • Type 0: Unrestricted

Another View of the Hierarchy CSL CFL RL

Formal Language & Automata Theory • Machines to recognizes each language class • Turing Machine (computable languages) • Push-down Automaton (context-free languages) • Finite Automaton (regular languages) • Use machines to prove that a given language belongs to a class • Formally prove that a given language does not belong to a class

Practical Applications of Theory • Translate from grammar to formal machine description • Implement the formal machine to parse the language • Tools: • Scanner Generator (RL / FA): LEX, FLEX • Parser Generator (CFL / FA): YACC, Bison

Beyond Parsing • Code generation • Optimization • Techniques to “mindlessly” improve code • Usually after code generation • Rarely “optimal”, simply better

Phases of a Compiler • Scanner -> tokens • Parser -> syntax tree • Semantic Analyzer -> annotated tree • Source code optimizer -> intermediate code • Code generator -> target code • Target code optimizer -> better target code

Additional Tables • Symbol table • Tracks all variable names and other symbols that will have to be mapped to addresses later • Literal table • Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program

Scanner • Read a stream of characters • Perform lexical analysis to generate tokens • Update symbol and literal tables as needed • Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM

Parser • Performs syntax analysis • Relates the sequence of tokens to the grammar • Builds a tree that represents this relationship, the parse tree

Partial Grammar • assign-expr -> expr = expr • array-expr -> ID [ expr ] • expr -> array-expr • expr -> expr + expr • expr -> ID • expr -> NUM

Example Parse assign-expression expression = expression array-expression add-expression ID [ expression ] expression + expression ID NUM NUM

Abstract Syntax Tree assign-expression expression expression array-expression add-expression ID expression expression expression ID NUM NUM

Semantic Analyzer • Determine the meaning (not structure) of the program • This is “compile-time” or static semantics only • Example; a[j] = 4 + 1 • a refers to an array location • a contains integers • j is an integer • j is in the range of the array (not checked in C) • Parse or Syntax tree is “decorated” with this information

Source Code Optimizer • Simplify and improve the source code by applying rules • Constant folding: replace “4+2” by 6 • Combine common sub-expressions • Reordering expressions (often prior to constant folding) • Etc. • Result: modified, decorated syntax tree or Intermediate Representation

Code Generator • Generates code for the target machine • Example: • MOV R0, j value of j into R0 • MUL R0, 2 2*j in R0 (int = 2 wds) • MOV R1, &a value of a in R1 • ADD R1, R0 a+2*j in R1 (addr of a[j]) • MOV *R1, 6 6 into address in R1

Target Code Optimizer • Apply rules to improve machine code • Example: • MOV R0, j • SHL R0 (shift to multiply by 2) Use more complex • MOV &a[R0], 6 machine instruction to replace simpler ones

Major Data Structures • Tokens • Syntax Tree • Symbol Table • Literal Table • Intermediate Code • Temporary files

Structuring a Compiler • Analysis vs. Synthesis • Analysis = understanding the source code • Synthesis = generating the target code • Front end vs. Back end • Front end: parsing & intermediate code generation (target machine-independent) • Back end: target code generation • Optimization included in both parts

Multiple Passes • Each pass process the source code once • One pass per phase • One pass for several phases • One pass for entire compilation • Language definition can preclude one-pass compilation

Runtime Environments • Static (e.g. FORTRAN) • No pointers, no dynamic allocation, no recursion • All memory allocation done prior to execution • Stack-based (e.g. C family) • Stack for nested allocation (call/return) • Heap for random allocation (new) • Fully dynamic (LISP) • Allocation is automatic (not in source code) • Garbage collection required

Error Handling • Each phase finds and handles its own types of errors • Scanning: errors like: 1o1 (invalid ID) • Parsing: syntax errors • Semantic Analysis: type errors • Runtime errors handled by the runtime environment • Exception handling by programmer often allowed

Compiling the Compiler • Using machine language • Immediately executable, hard to write • Necessary for the first (FORTRAN) compiler • Using a language with an existing compiler and the same target machine • Using the language to be compiled (bootstrapping)

Bootstrapping • Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) • Write a complete compiler in the language subset • Compile the complete compiler using the “quick & dirty” compiler

Understanding Compilers: Theory to Practical Application

Understanding Compilers: Theory to Practical Application

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction