1 / 65

Chapter 1: Introduction to Compiling

Chapter 1: Introduction to Compiling. Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155. steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818.

declan
Download Presentation

Chapter 1: Introduction to Compiling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1: Introduction to Compiling Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

  2. Introduction to Compilers Sourceprogram Target Program Compiler Error messages Diverse & Varied • As a Discipline, Involves Multiple CSE Areas • Programming Languages and Algorithms • Software Engineering & Theory / Foundations • Computer Architecture & Operating Systems • But, Has Surprisingly Simplistic Intent:

  3. What is the Compilation Process?Consider the main.c C Program? #include <stdio.h> #include <ctype.h> #include <string.h> main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy(day, "Wednesday"); printf("Today is: "); printf("%s %d/%d/%d\n", day, m, d, y); }

  4. Viewing C from Compiler Theory • Tokens or Patterns for the “Words” of the Programming Language • Specified in Flex • Grammar Rules that Describe the Allowable Sentences of the Programming Language • Specified in Bison • Other Stages of Compilation • Semantic and Type Checking Analysis • Intermediate Code Generation • Code Optimization • Final (Relocatable) Code Generation

  5. Viewing C from Usage Side • ANSI C Specification of Lexical Structure in Flex • ANSI C Specification of Grammar Rules in Bison • Multi-Stage Compilation Process from Source Code to Preprocessed Code to Assembly Code to Object Code to Executable • Preprocess: gcc –E main.c > main.e • Assembly: gcc –S main.c Generates main.s • Object: gcc –c main.c Generates main.o • Executable: gcc main.c Generates a.out

  6. What are Tokens?Smallest Individual Units that are Recognized #include <stdio.h> #include <ctype.h> #include <string.h> main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy(day, "Wednesday"); printf("Today is: "); printf("%s %d/%d/%d\n", day, m, d, y); } Tokens???

  7. C Lexical Specification in Flex • http://www.lysator.liu.se/c/ANSI-C-grammar-l.html D [0-9] L [a-zA-Z_] H [a-fA-F0-9] E [Ee][+-]?{D}+ FS (f|F|l|L) IS (u|U|l|L)* %{ #include <stdio.h> #include "y.tab.h" void count(); %} %% "/*" { comment(); }

  8. C Flex Continued … "auto" { count(); return(AUTO); } "break" { count(); return(BREAK); } "case" { count(); return(CASE); } "char" { count(); return(CHAR); } "const" { count(); return(CONST); } "continue" { count(); return(CONTINUE); } "default" { count(); return(DEFAULT); } "do" { count(); return(DO); } "double" { count(); return(DOUBLE); } "else" { count(); return(ELSE); } "enum" { count(); return(ENUM); } "extern" { count(); return(EXTERN); } "float" { count(); return(FLOAT); } "for" { count(); return(FOR); } "goto" { count(); return(GOTO); } "if" { count(); return(IF); } "int" { count(); return(INT); }

  9. C Flex Continued … "long" { count(); return(LONG); } "register" { count(); return(REGISTER); } "return" { count(); return(RETURN); } "short" { count(); return(SHORT); } "signed" { count(); return(SIGNED); } "sizeof" { count(); return(SIZEOF); } "static" { count(); return(STATIC); } "struct" { count(); return(STRUCT); } "switch" { count(); return(SWITCH); } "typedef" { count(); return(TYPEDEF); } "union" { count(); return(UNION); } "unsigned" { count(); return(UNSIGNED); } "void" { count(); return(VOID); } "volatile" { count(); return(VOLATILE); } "while" { count(); return(WHILE); } What TOKEN is Missing?

  10. C Flex Continued … {L}({L}|{D})* { count(); return(check_type()); } 0[xX]{H}+{IS}? { count(); return(CONSTANT); } 0{D}+{IS}? { count(); return(CONSTANT); } {D}+{IS}? { count(); return(CONSTANT); } L?'(\\.|[^\\'])+' { count(); return(CONSTANT); } {D}+{E}{FS}? { count(); return(CONSTANT); } {D}*"."{D}+({E})?{FS}? { count(); return(CONSTANT); } {D}+"."{D}*({E})?{FS}? { count(); return(CONSTANT); } L?\"(\\.|[^\\"])*\" { count(); return(STRING_LITERAL); } What Do These Represent?

  11. C Flex Continued … "..." { count(); return(ELLIPSIS); } ">>=" { count(); return(RIGHT_ASSIGN); } "<<=" { count(); return(LEFT_ASSIGN); } "+=" { count(); return(ADD_ASSIGN); } "-=" { count(); return(SUB_ASSIGN); } "*=" { count(); return(MUL_ASSIGN); } "/=" { count(); return(DIV_ASSIGN); } "%=" { count(); return(MOD_ASSIGN); } "&=" { count(); return(AND_ASSIGN); } "^=" { count(); return(XOR_ASSIGN); } "|=" { count(); return(OR_ASSIGN); } ">>" { count(); return(RIGHT_OP); } "<<" { count(); return(LEFT_OP); } "++" { count(); return(INC_OP); } "--" { count(); return(DEC_OP); } "->" { count(); return(PTR_OP); }

  12. C Flex Continued … MISSING LOTS OF FLEX "<" { count(); return('<'); } ">" { count(); return('>'); } "^" { count(); return('^'); } "|" { count(); return('|'); } "?" { count(); return('?'); } [ \t\v\n\f] { count(); } . { /* ignore bad characters */ } %% yywrap() { return(1); } MISSING OTHER CODE

  13. C Bison Specification • http://www.lysator.liu.se/c/ANSI-C-grammar-y.html %token IDENTIFIER CONSTANT STRING_LITERAL SIZEOF %token PTR_OP INC_OP DEC_OP LEFT_OP RIGHT_OP LE_OP %token GE_OP EQ_OP NE_OP %token AND_OP OR_OP MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN %token SUB_ASSIGN LEFT_ASSIGN RIGHT_ASSIGN AND_ASSIGN %token XOR_ASSIGN OR_ASSIGN TYPE_NAME ADD_ASSIGN %token TYPEDEF EXTERN STATIC AUTO REGISTER %token CHAR SHORT INT LONG SIGNED UNSIGNED FLOAT DOUBLE CONST %token VOLATILE VOID %token STRUCT UNION ENUM ELLIPSIS %token CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE %token BREAK RETURN %start translation_unit %%

  14. Bison Spec Continued … primary_expression : IDENTIFIER | CONSTANT | STRING_LITERAL | '(' expression ')' ; postfix_expression : primary_expression | postfix_expression '[' expression ']' | postfix_expression '(' ')' | postfix_expression '(' argument_expression_list ')' | postfix_expression '.' IDENTIFIER | postfix_expression PTR_OP IDENTIFIER | postfix_expression INC_OP | postfix_expression DEC_OP ; argument_expression_list : assignment_expression | argument_expression_list ',' assignment_expression These are Grammar Rules

  15. Bison Spec Continued … These Rules Declare Variables declaration_list : declaration | declaration_list declaration declaration : declaration_specifiers ';' | declaration_specifiers init_declarator_list ';' ; declaration_specifiers : storage_class_specifier | storage_class_specifier declaration_specifiers | type_specifier | type_specifier declaration_specifiers | type_qualifier | type_qualifier declaration_specifiers

  16. Bison Spec Continued … statement_list : statement | statement_list statement ; statement : labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement ; labeled_statement : IDENTIFIER ':' statement | CASE constant_expression ':' statement | DEFAULT ':' statement ; List of Statements and Different Statements

  17. Bison Spec Continued … expression_statement : ';' | expression ';' ; selection_statement : IF '(' expression ')' statement | IF '(' expression ')' statement ELSE statement | SWITCH '(' expression ')' statement ; iteration_statement : WHILE '(' expression ')' statement | DO statement WHILE '(' expression ')' ';' | FOR '(' expression_statement expression_statement ')' statement | FOR '(' expression_statement expression_statement expression ')' statement ;

  18. Reviewing Compilation Process in C • Current Programmers Used to State of the Art IDEs (Eclipse, etc.) • IDEs focus on Higher Level Concepts • Syntax Correction Capabilities • Matching Parens and Brackets • Sophisticated Compilation Error Messages • Debugging Environment • Process Hides Details – In C: • Multi-Stage Compilation Process from Source Code to Preprocessed Code to Assembly Code to Object Code to Executable

  19. From Source to Preprocessing:What Does main.e Contain? # 1 "main.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "main.c" # 1 "/usr/include/stdio.h" 1 3 4 # 28 "/usr/include/stdio.h" 3 4 # 1 "/usr/include/features.h" 1 3 4 # 330 "/usr/include/features.h" 3 4 # 1 "/usr/include/sys/cdefs.h" 1 3 4 # 348 "/usr/include/sys/cdefs.h" 3 4 # 1 "/usr/include/bits/wordsize.h" 1 3 4 # 349 "/usr/include/sys/cdefs.h" 2 3 4 # 331 "/usr/include/features.h" 2 3 4 # 354 "/usr/include/features.h" 3 4 # 1 "/usr/include/gnu/stubs.h" 1 3 4 … etc ….

  20. From Source to Preprocessing:What Does main.e Contain? … missing almost 900 lines of code … # 4 "main.c" 2 main() { char day[10]; int m, d, y; m = 9; d = 7; y = 2011; strcpy(day, "Wednesday"); printf("Today is: "); printf("%s %d/%d/%d\n", day, m, d, y); }

  21. From Preprocessing to Assembly:What Does main.s Contain?

  22. What are Final Two Steps? • Generation of “.o” files • Linking of “.o” files to Create a.out • In Olden Days (no IDEs): • Compilation of 250K code take 2-3 Hours • Utilization of Makefiles for Smart Compilation • Creation of Directory Structure for Code • Makefiles to Establish Dependencies Among “.h” and “.c” and “.o” Files • Change in Time Stamp Causes Recompile • Try to Recompile “Least” Amount of Code • Then – Relink all “.o” Files into a.out

  23. Programming Languages Offer … • Abstractions • At different levels • From low • Good for machines…. • To high • Good for humans…. • Three Approaches • Interpreted • Compiled • Mixed

  24. Interpreter • Motivation… • Easiest to implement! • Upside ? • Downside ? • Phases • Lexical analysis • Parsing • Semantic checking • Interpretation • Interpreted Languages?

  25. Compiler • Motivation • It is natural! • Upside? • Downside? • Phases • [Interpreter] • Code Generation • Code Optimization • Link & load • Compiled Languages?

  26. Mixed • Motivation • The best of two breeds… • Upside ? • Downside? • Mixed Languages?

  27. Classifications of Compilers • Compilers Viewed from Many Perspectives • However, All utilize same basic tasks to accomplish their actions Single Pass Multiple Pass Load & Go Construction Debugging Optimizing Functional

  28. Compiler STructure • Two Fundamental Sets of Issues • We Will Discuss Each Category in This Class Analysis: Decompose Source into an intermediate representation Text, Syntactic, and Structural Analysis Synthesis: Target program generation from representation which includes optimization

  29. Important Notes • In Today’s Technology, Analysis Is Often Performed by Software Tools - This Wasn’t the Case in Early CSE Days • Structure / Syntax directed editors: Force “syntactically” correct code to be entered • Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.) • Static Checkers: A “quick” compilation to detect rudimentary errors • Interpreters: “real” time execution of code a “line-at-a-time”

  30. Important Notes • Compilation Is Not Limited to Programming Language Applications • Text Formatters • LATEX & TROFF Are Languages Whose Commands Format Text • Silicon Compilers • Textual / Graphical: Take Input and Generate Circuit Design • Database Query Processors • Database Query Languages Are Also a Programming Language • Input Is“compiled” Into a Set of Operations for Accessing the Database

  31. Summary of Various Languages • Compiled • Programming Languages • [C,C#,ML,LISP,…] • Communication Languages • [XML,HTML,…] • Presentation Languages • [CSS,SGML,…] • Hardware Languages • [VHDL,…] • Formatting Languages • [Postscript,troff,LaTeX,…] • Query Languages • [SQL & friends]

  32. The Many Phases of a Compiler Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Error Handler Symbol-table Manager 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis

  33. 5 1 2 3 What Does Relocatable Mean? Pre-Processor Compiler Loader Link/Editor Assembler Library,relocatable object files Source Program 4 RelocatableMachine Code Executable

  34. 2 3 The Analysis Task For Compilation Language Analysis Phases Semantic Analyzer Syntax Analyzer • Three Phases: • Linear / Lexical Analysis: • L-to-r Scan to Identify Tokens • Hierarchical Analysis: • Grouping of Tokens Into Meaningful Collection • Semantic Analysis: • Checking to Insure Correctness of Components Source Program 1 Lexical Analyzer

  35. Phase 1. Lexical Analysis All are tokens Easiest Analysis - Identify tokens which are building blocks For Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ Blanks, Line breaks, etc. are scanned out

  36. Id Date Id Date Id System Symbol ( Symbol + Symbol ) Lexical Analysis Id x Keyword new Symbol . Symbol ) Integer 30 Symbol ; Symbol := Symbol ( Id Today Id Date Id Date Id System Symbol ( Symbol + Symbol ) Id x Keyword new Symbol . Symbol ) Integer 30 Symbol ; Symbol := Symbol ( Id Today • Purpose • Slice the sequence of symbols into tokens Date x := new Date ( System.today( ) + 30 ) ;

  37. Phase 2. Hierarchical Analysisaka Parsing or Syntax Analysis assignment statement := identifier expression + position expression expression * identifier expression expression initial identifier number rate 60 For 1st example, we would have Parse Tree: Nodes of tree are constructed using a grammar for the language

  38. Syntax Analysis (parsing) • For Second example, we would have: • Organize tokens in sentences based on grammar

  39. What is a Grammar? • Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the Tokens statement is an assignment statement, or while statement, or if statement, or ... assignment statement is an identifier := expression ; expression is an (expression), or expression + expression, or expression * expression, or number, or identifier, or ...

  40. Summary so far… • Turn a symbol stream into a parse tree

  41. Semantic Analysis • Lexical Analysis - Scans Input to Identify “words” that are the the Tokens of the Language • Syntactic Analysis uses Recursion to Identify Structure as Indicated in Parse Tree • What is Semantic Analysis? • Purpose • Determine Unique / Unambiguous Interpretation • Catch errors related to program meaning • Determine Whether the Sentences have One and Only One Unambiguous Interpretation • “John Took Picture of Mary Out on the Patio” • For a PL – Wrong Types, Missing Declaration, Missing Methods, Ambiguous Statements, etc.

  42. Phase 3. Semantic Analysis := := position + position + initial * initial * rate 60 rate inttoreal 60 • Find More Complicated Semantic Errors and Support Code Generation • Parse Tree Is Augmented With Semantic Actions Compressed Tree Conversion Action

  43. Phase 3. Semantic Analysis • Most Important Activity in This Phase: • Type Checking - Legality of Operands • Many Different Situations: • Primary Tool is Symbol Table Real := int + char ; A[int] := A[real] + int ; while char <> int do …. Etc.

  44. Analysis in Text Formatting Simple Commands : LATEX \begin{single} \end{single} \noindent \section{Introduction} $A_i$ $A_{i_j}$ begin single noindent section Embedded in a stream of text, i.e., a FILE Language Commands \ and $ serve as signals to LATEX What are tokens? What is hierarchical structure? What kind of semantic analysis is required?

  45. Supporting Phases/ Activities for Analysis • Symbol Table Creation / Maintenance • Contains Info on Each “Meaningful” Token, Typically Identifiers • Data Structure Created / Initialized During Lexical Analysis • Utilized / Updated During Later Analysis & Synthesis • Error Handling • Detection of Different Errors Which Correspond to All Phases • What Kinds of Errors Are Found During the Analysis Phase? • What Happens When an Error Is Found?

  46. Summary so far... • Turned a symbol stream into an annotated parse tree

  47. From Analysis to Synthesis Source Program 5 1 2 6 Code Optimizer Lexical Analyzer Code Generator Syntax Analyzer 3 Semantic Analyzer Error Handler Symbol-table Manager 4 Intermediate Code Generator Target Program 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis Phase

  48. The Synthesis Task For Compilation • Intermediate Code Generation • Abstract Machine Version of Code - Independent of Architecture • Easy to Produce and Do Final, Machine Dependent Code Generation • Code Optimization • Find More Efficient Ways to Execute Code • Replace Code With More Optimal Statements • 2-approaches: High-level Language & “Peephole” Optimization • Final Code Generation • Generate Relocatable Machine Dependent Code

  49. Intermediate Code • What is intermediate code ? • A low level representation • A simple representation • Easy to reason about • Easy to manipulate • Programming Language independent • Hardware independent

  50. IR Code example • Quadruples • (x,y,op,z) to represent x := y op z • Infinitely many temporaries • Implicit call stack management • Example Date x := new Date ( System.today( ) + 30 ) ; push System t0 := call today t1 := 30 t2 := t0 + t1 push t2 t3 := call DateFactory x := t3

More Related