1 / 29

Understanding Compilers: Theory to Practical Application

This course explores the significance of compilers in computer science, bridging architecture and programming, with tools for language interpreters beyond programming languages. Learn about Chomsky Hierarchy, Formal Languages, Automata Theory, and practical compiler applications.

claudiat
Download Presentation

Understanding Compilers: Theory to Practical Application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction CPSC 388 Ellen Walker Hiram College

  2. Why Learn About Compilers? • Practical application of important computer science theory • Ties together computer architecture and programming • Useful tools for developing language interpreters • Not just programming languages!

  3. Computer Languages • Machine language • Binary numbers stored in memory • Bits correspond directly to machine actions • Assembly language • A “symbolic face” for machine language • Line-for-line translation • High-level language (our goal!) • Closer to human expressions of problems, e.g. mathematical notation

  4. Assembler vs. HLL • Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X • HLL X = 2;

  5. Characteristics of HLL’s • Easier to learn (and remember) • Machine independent • No knowledge of architecture needed • … as long as there is a compiler for that machine!

  6. Early Milestones • FORTRAN (Formula Translation) • IBM (John Backus) 1954-1957 • First High-level language, and first compiler • Chomsky Hierarchy (1950’s) • Formal description of natural language structure • Ranks languages according to the complexity of their grammar

  7. Chomsky Hierarchy • Type 3: Regular languages • Too simple for programming languages • Good for tokens, e.g. numbers • Type 2: Context Free languages • Standard representation of programming languages • Type 1: Context Sensitive Languages • Type 0: Unrestricted

  8. Another View of the Hierarchy CSL CFL RL

  9. Formal Language & Automata Theory • Machines to recognizes each language class • Turing Machine (computable languages) • Push-down Automaton (context-free languages) • Finite Automaton (regular languages) • Use machines to prove that a given language belongs to a class • Formally prove that a given language does not belong to a class

  10. Practical Applications of Theory • Translate from grammar to formal machine description • Implement the formal machine to parse the language • Tools: • Scanner Generator (RL / FA): LEX, FLEX • Parser Generator (CFL / FA): YACC, Bison

  11. Beyond Parsing • Code generation • Optimization • Techniques to “mindlessly” improve code • Usually after code generation • Rarely “optimal”, simply better

  12. Phases of a Compiler • Scanner -> tokens • Parser -> syntax tree • Semantic Analyzer -> annotated tree • Source code optimizer -> intermediate code • Code generator -> target code • Target code optimizer -> better target code

  13. Additional Tables • Symbol table • Tracks all variable names and other symbols that will have to be mapped to addresses later • Literal table • Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program

  14. Scanner • Read a stream of characters • Perform lexical analysis to generate tokens • Update symbol and literal tables as needed • Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM

  15. Parser • Performs syntax analysis • Relates the sequence of tokens to the grammar • Builds a tree that represents this relationship, the parse tree

  16. Partial Grammar • assign-expr -> expr = expr • array-expr -> ID [ expr ] • expr -> array-expr • expr -> expr + expr • expr -> ID • expr -> NUM

  17. Example Parse assign-expression expression = expression array-expression add-expression ID [ expression ] expression + expression ID NUM NUM

  18. Abstract Syntax Tree assign-expression expression expression array-expression add-expression ID expression expression expression ID NUM NUM

  19. Semantic Analyzer • Determine the meaning (not structure) of the program • This is “compile-time” or static semantics only • Example; a[j] = 4 + 1 • a refers to an array location • a contains integers • j is an integer • j is in the range of the array (not checked in C) • Parse or Syntax tree is “decorated” with this information

  20. Source Code Optimizer • Simplify and improve the source code by applying rules • Constant folding: replace “4+2” by 6 • Combine common sub-expressions • Reordering expressions (often prior to constant folding) • Etc. • Result: modified, decorated syntax tree or Intermediate Representation

  21. Code Generator • Generates code for the target machine • Example: • MOV R0, j value of j into R0 • MUL R0, 2 2*j in R0 (int = 2 wds) • MOV R1, &a value of a in R1 • ADD R1, R0 a+2*j in R1 (addr of a[j]) • MOV *R1, 6 6 into address in R1

  22. Target Code Optimizer • Apply rules to improve machine code • Example: • MOV R0, j • SHL R0 (shift to multiply by 2) Use more complex • MOV &a[R0], 6 machine instruction to replace simpler ones

  23. Major Data Structures • Tokens • Syntax Tree • Symbol Table • Literal Table • Intermediate Code • Temporary files

  24. Structuring a Compiler • Analysis vs. Synthesis • Analysis = understanding the source code • Synthesis = generating the target code • Front end vs. Back end • Front end: parsing & intermediate code generation (target machine-independent) • Back end: target code generation • Optimization included in both parts

  25. Multiple Passes • Each pass process the source code once • One pass per phase • One pass for several phases • One pass for entire compilation • Language definition can preclude one-pass compilation

  26. Runtime Environments • Static (e.g. FORTRAN) • No pointers, no dynamic allocation, no recursion • All memory allocation done prior to execution • Stack-based (e.g. C family) • Stack for nested allocation (call/return) • Heap for random allocation (new) • Fully dynamic (LISP) • Allocation is automatic (not in source code) • Garbage collection required

  27. Error Handling • Each phase finds and handles its own types of errors • Scanning: errors like: 1o1 (invalid ID) • Parsing: syntax errors • Semantic Analysis: type errors • Runtime errors handled by the runtime environment • Exception handling by programmer often allowed

  28. Compiling the Compiler • Using machine language • Immediately executable, hard to write • Necessary for the first (FORTRAN) compiler • Using a language with an existing compiler and the same target machine • Using the language to be compiled (bootstrapping)

  29. Bootstrapping • Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) • Write a complete compiler in the language subset • Compile the complete compiler using the “quick & dirty” compiler

More Related