260 likes | 411 Views
Chapter 1: Introduction to Compiling. Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155. steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818. Dr. Robert LaBarre
E N D
Chapter 1: Introduction to Compiling Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Dr. Robert LaBarre United Technologies Research Center 411 Silver Lane E. Hartford, CT 06018 LaBarrRE@utrc.utc.com labarre_math@hotmail.com
Introduction to Compilers Sourceprogram Target Program Compiler Error messages Diverse & Varied • As a Discipline, Involves Multiple CSE Areas • Programming Languages and Algorithms • Software Engineering & Theory / Foundations • Computer Architecture & Operating Systems • But, Has Surprisingly Simplistic Intent:
Classifications of Compilers • Compilers Viewed from Many Perspectives • However, All utilize same basic tasks to accomplish their actions Single Pass Multiple Pass Load & Go Construction Debugging Optimizing Functional
Classifications of Compilers • Also, Broadly Categorized as: • We Will Discuss Each Category in This Class Analysis: Decompose Source into an intermediate representation Synthesis: Target program generation from representation
Important Notes • In Today’s Technology, Analysis Is Often Performed by Software Tools - This Wasn’t the Case in Early CSE Days • Structure / Syntax directed editors: Force “syntactically” correct code to be entered • Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.) • Static Checkers: A “quick” compilation to detect rudimentary errors • Interpreters: “real” time execution of code a “line-at-a-time”
Important Notes • Compilation Is Not Limited to Programming Language Applications • Text Formatters • LATEX & TROFF Are Languages Whose Commands Format Text • Silicon Compilers • Textual / Graphical: Take Input and Generate Circuit Design • Database Query Processors • Database Query Languages Are Also a Programming Language • Input Is“compiled” Into a Set of Operations for Accessing the Database
The Many Phases of a Compiler Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Error Handler Symbol-table Manager 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis
The Analysis Task For Compilation • Three Phases: • Linear / Lexical Analysis: • L-to-r Scan to Identify Tokens • Hierarchical Analysis: • Grouping of Tokens Into Meaningful Collection • Semantic Analysis: • Checking to Insure Correctness of Components
Phase 1. Lexical Analysis All are tokens Easiest Analysis - Identify tokens which are building blocks For Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ Blanks, Line breaks, etc. are scanned out
Phase 2. Hierarchical Analysisaka Parsing or Syntax Analysis assignment statement := identifier expression + position expression expression * identifier expression expression initial identifier number rate 60 For previous example, we would have Parse Tree: Nodes of tree are constructed using a grammar for the language
What is a Grammar? • Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the Tokens statement is an assignment statement, or while statement, or if statement, or ... assignment statement is an identifier := expression ; expression is an (expression), or expression + expression, or expression * expression, or number, or identifier, or ...
Why Have We Divided Analysis in This Manner? • Lexical Analysis - Scans Input & Its Linear Actions Are Not Recursive • Identify Only Individual “words” that are the the Tokens of the Language • Recursion Is Required to Identify Structure of an Expression, As Indicated in Parse Tree • Verify that the “words” are Correctly Assembled into “sentences” • What is Third Phase? • Determine Whether the Sentences have One and Only One Unambiguous Interpretation • “John Took Picture of Mary Out on the Patio”
Phase 3. Semantic Analysis := := position + position + initial * initial * rate 60 rate inttoreal 60 • Find More Complicated Semantic Errors and Support Code Generation • Parse Tree Is Augmented With Semantic Actions Compressed Tree Conversion Action
Phase 3. Semantic Analysis • Most Important Activity in This Phase: • Type Checking - Legality of Operands • Many Different Situations: Real := int + char ; A[int] := A[real] + int ; while char <> int do …. Etc.
Analysis in Text Formatting Simple Commands : LATEX \begin{single} \end{single} \noindent \section{Introduction} $A_i$ $A_{i_j}$ begin single noindent section Embedded in a stream of text, i.e., a FILE Language Commands \ and $ serve as signals to LATEX What are tokens? What is hierarchical structure? What kind of semantic analysis is required?
Supporting Phases/ Activities for Analysis • Symbol Table Creation / Maintenance • Contains Info on Each “Meaningful” Token, Typically Identifiers • Data Structure Created / Initialized During Lexical Analysis • Utilized / Updated During Later Analysis & Synthesis • Error Handling • Detection of Different Errors Which Correspond to All Phases • What Kinds of Errors Are Found During the Analysis Phase? • What Happens When an Error Is Found?
The Many Phases of a Compiler Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Error Handler Symbol-table Manager 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis
The Synthesis Task For Compilation • Intermediate Code Generation • Abstract Machine Version of Code - Independent of Architecture • Easy to Produce and Do Final, Machine Dependent Code Generation • Code Optimization • Find More Efficient Ways to Execute Code • Replace Code With More Optimal Statements • 2-approaches: High-level Language & “Peephole” Optimization • Final Code Generation • Generate Relocatable Machine Dependent Code
Reviewing the Entire Process intermediate code generator lexical analyzer syntax analyzer semantic analyzer := + id1 id2l * id3 60 := + id1 id2l * id3 inttoreal 60 position := initial + rate * 60 id1 := id2 + id3 * 60 Symbol Table Errors position .... initial …. rate….
Reviewing the Entire Process intermediate code generator code optimizer final code generator Errors Symbol Table position .... initial …. rate…. temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 temp1 := id3 * 60.0 id1 := id2 + temp1 mov f id3, r2 mulf #60.0, r2 movf id2, r1 addf r2, r2 movf r1, id1
Compiler Cousins:PreprocessorsProvide Input to Compilers 1. Macro Processing #define in C: does text substitution before compiling #define X 3 #define Y A*B+C #define Z getchar()
2. File Inclusion defs.h main.c ////// ////// ////// #include “defs.h” …---…---…--- …---…---…--- …---…---…--- ////// ////// ////// …---…---…--- …---…---…--- …---…---…--- #include in C - bring in another file before compiling
3. Rational Preprocessors • Augment “Old” Languages With Modern Constructs • Add Macros for If - Then, While, Etc. • #Define Can Make C Code More Pascal-like #define begin { #define end } #define then
4. Language Extensions for a Database System EQUEL - Database query language embedded in C ## Retrieve (DN=Department.Dnum) where ## Department.Dname = ‘Research’ is Preprocessed into: ingres_system(“Retr…..Research’”,____,____); a procedure call in a programming language.
The Grouping of Phases Front End : Analysis + Intermediate Code Generation vs. Back End : Code Generation + Optimization Number of Passes: Single - Preferred Multiple - Easier, but less efficient Tradeoffs ……..
Compiler Construction Tools Parser Generators : Produce Syntax Analyzers Scanner Generators : Produce Lexical Analyzers Syntax-directed Translation Engines : Generate Intermediate Code Automatic Code Generators : Generate Actual Code Data-Flow Engines : Support Optimization