370 likes | 549 Views
Chapter 1: Introduction to Compiling. Dr. Winai Wichaipanitch Rajamangala Institute of Technology Klong 6 Thanyaburi Pathumthani 12110 Tel: 06-999-2974. wina@rit.ac.th http://www.en.rit.ac.th/winai. Source program. Target Program. Compiler. Error messages. Diverse & Varied.
E N D
Chapter 1: Introduction to Compiling Dr. Winai Wichaipanitch Rajamangala Institute of Technology Klong 6 Thanyaburi Pathumthani 12110 Tel: 06-999-2974 wina@rit.ac.th http://www.en.rit.ac.th/winai Chapter 1
Sourceprogram Target Program Compiler Error messages Diverse & Varied Purpose of Compiler • Compilers translate a program written into one language (source) into another (target) Chapter 1
Introduction to Compilers • As a Discipline, Involves Multiple CS&E Areas • Programming Languages and Algorithms • Theory of Computing & Software Engineering • Computer Architecture & Operating Systems Chapter 1
Translation Mechanisms • Compilation • To translate a source program in one language into an executable program in another language and produce results while executing the new program • Examples: C, C++, FORTRAN • Interpretation • To read a source program and produce the results while understanding that program • Examples: BASIC, LISP • Case Study: JAVA • First, translate to java bytecode • Second, execute by interpretation (JVM) Chapter 1
Source Code Object code compiler Results Data Comparison of Compiler/Interpreter Results interpreter Source Code Data Chapter 1
Classifications of Compilers • Compilers Viewed from Many Perspectives • However, All utilize same basic tasks to accomplish their actions Single Pass Multiple Pass Load & Go Construction Debugging Optimizing Functional Chapter 1
เรายังไม่ทราบค่าแอดเดรส ดังนั้นต้องอ่าน Source code 2 ครั้ง Chapter 1
The Model • The TWO Fundamental Parts: • We Will Discuss Both in This Class, andFOCUS on analysis. Analysis: Decompose Source into an intermediate representation Synthesis: Target program generation from representation Chapter 1
Important Notes • Today: There are many Software Tools for helping with the Analysis Part. This Wasn’t the Case in Early Days. (some) analysis is also important in: • Structure / Syntax directed editors: Force “syntactically” correct code to be entered • Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.) • Static Checkers: A “quick” compilation to detect rudimentary errors • Interpreters: “real” time execution of code a “line-at-a-time” Chapter 1
Important Notes • Compilation Is Not Limited to Programming Language Applications • Text Formatters • LATEX & TROFF Are Languages Whose Commands Format Text • Silicon Compilers • Textual / Graphical: Take Input and Generate Circuit Design • Database Query Processors • Database Query Languages Are Also a Programming Language • Input is compiled Into a Set of Operations for Accessing the Database Chapter 1
The Many Phases of a Compiler Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Symbol-table Manager Error Handler 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis Chapter 1
Phases of A Modern Compiler Source Program IF (a<b) THEN c=1*d; Lexical Analyzer IF ( ID “a” < ID “b” THEN ID “c” = CONST “1” * ID “d” Token Sequence a Syntax Analyzer cond_expr < b Syntax Tree IF_stmt lhs c list 1 assign_stmt rhs Semantic Analyzer * d GE a, b, L1 MUlT 1, d, c L1: 3-Address Code Code Optimizer GE a, b, L1 MOV d, c L1: loadi R1,a cmpi R1,b jge L1 loadi R1,d storei R1,c L1: Optimized 3-Addr. Code Code Generation Assembly Code Chapter 1
Language-Processing System 5 1 2 Loader Link/Editor Pre-Processor Compiler 3 Assembler Library,relocatable object files 4 RelocatableMachine Code Source Program Executable Chapter 1
The Analysis Task For Compilation • Three Phases: • Linear / Lexical Analysis: • L-to-r Scan to Identify Tokenstoken: sequence of chars having a collective meaning • Hierarchical Analysis: • Grouping of Tokens Into Meaningful Collection • Semantic Analysis: • Checking to ensure Correctness of Components Chapter 1
Phase 1. Lexical Analysis All are tokens Easiest Analysis - Identify tokens which are the basic building blocks For Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ Blanks, Line breaks, etc. are scanned out Chapter 1
Phase 2. Hierarchical Analysisaka Parsing or Syntax Analysis assignment statement := identifier expression + position expression expression * identifier expression expression initial identifier number rate 60 For previous example, we would have Parse Tree: Nodes of tree are constructed using a grammar for the language Chapter 1
What is a Grammar? • Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the Tokens statement is an assignment statement, or while statement, or if statement, or ... assignment statement is an identifier := expression ; expression is an (expression), or expression + expression, or expression * expression, or number, or identifier, or ... Chapter 1
Syntax Tree if statement then if expression statement else statement ; assign statement assign statement id relop id id := expression id := expression num = 0 avg id id mulop id 0 avg / num Chapter 1
Why Have We Divided Analysis in This Manner? • Lexical Analysis - Scans Input, Its Linear Actions Are Not Recursive • Identify Only Individual “words” that are the the Tokens of the Language • Recursion Is Required to Identify Structure of an Expression, As Indicated in Parse Tree • Verify that the “words” are Correctly Assembled into “sentences” Chapter 1
Phase 3. Semantic Analysis := := position + position + initial * initial * rate 60 rate inttoreal 60 • Find More Complicated Semantic Errors and Support Code Generation • Parse Tree Is Augmented With Semantic Actions Compressed Tree Conversion Action Chapter 1
Phase 3. Semantic Analysis • Most Important Activity in This Phase: • Type Checking - Legality of Operands Chapter 1
Supporting Phases/ Activities for Analysis • Symbol Table Creation / Maintenance • Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers • Data Structure Created / Initialized During Lexical Analysis • Utilized / Updated During Later Analysis & Synthesis Chapter 1
Symbol Table for Example Chapter 1
Error Handling • Detection of Different Errors Which Correspond to All Phases • What Kinds of Errors Are Found During the Analysis Phase? • What Happens When an Error Is Found? Chapter 1
The Many Phases of a Compiler Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Symbol-table Manager Error Handler 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis Chapter 1
The Synthesis Task For Compilation • Intermediate Code Generation • Abstract Machine Version of Code - Independent of Architecture • Easy to Produce and Do Final, Machine Dependent Code Generation • Code Optimization • Find More Efficient Ways to Execute Code • Replace Code With More Optimal Statements • 2-approaches: High-level Language & “Peephole” Optimization • Final Code Generation • Generate Relocatable Machine Dependent Code Chapter 1
Reviewing the Entire Process lexical analyzer syntax analyzer semantic analyzer intermediate code generator := + id1 id2l * id3 60 := + id1 id2l * id3 inttoreal 60 position := initial + rate * 60 id1 := id2 + id3 * 60 Symbol Table Errors position .... initial …. rate…. Chapter 1
Reviewing the Entire Process intermediate code generator code optimizer final code generator Errors Symbol Table position .... initial …. rate…. temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 3 address code temp1 := id3 * 60.0 id1 := id2 + temp1 MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1 Chapter 1
Assemblers • Assembly code: names are used for instructions, and names are used for memory addresses. • Two-pass Assembly: • First Pass: all identifiers are assigned to memory addresses (0-offset)e.g. substitute 0 for a, and 4 for b • Second Pass: produce relocatable machine code: MOV a, R1 ADD #2, R1 MOV R1, b 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * relocation bit Chapter 1
Loaders and Link-Editors • Loader: taking relocatable machine code, altering the addresses and placing the altered instructionsinto memory. • Link-editor: taking many (relocatable) machine code programs (with cross-references) and produce a single file. • Need to keep track of correspondence between variable names and corresponding addresses in each piece of code. Chapter 1
Compiler Cousins:PreprocessorsProvide Input to Compilers 1. Macro Processing #define in C: does text substitution before compiling #define X 3 #define Y A*B+C #define Z getchar() Chapter 1
2. File Inclusion defs.h main.c ////// ////// ////// #include “defs.h” …---…---…--- …---…---…--- …---…---…--- ////// ////// ////// …---…---…--- …---…---…--- …---…---…--- #include in C - bring in another file before compiling Chapter 1
3. Rational Preprocessors • Augment “Old” Languages With Modern Constructs • Add Macros for If - Then, While, Etc. • #Define Can Make C Code More Pascal-like #define begin { #define end } #define then Chapter 1
4. Language Extensions for a Database System EQUEL - Database query language embedded in C ## Retrieve (DN=Department.Dnum) where ## Department.Dname = ‘Research’ is Preprocessed into: ingres_system(“Retr…..Research’”,____,____); a procedure call in a programming language. Chapter 1
The Grouping of Phases Front End : Analysis + Intermediate Code Generation vs. Back End : Code Generation + Optimization Number of Passes: A pass: requires r/w intermediate files Fewer passes: more efficiency. However: fewer passes require more sophisticated memory management and compiler phase interaction. Tradeoffs …….. Chapter 1
Compiler Construction Tools Parser Generators : Produce Syntax Analyzers Scanner Generators : Produce Lexical Analyzers Syntax-directed Translation Engines : Generate Intermediate Code Automatic Code Generators : Generate Actual Code Data-Flow Engines : Support Optimization Chapter 1
Tools • Tools exist to help in the development of some stages of the compiler • Lex (Flex) - lexical analysis generator • Yacc (Bison) - parser generator Chapter 1