380 likes | 502 Views
Chapter 1 . Introduction to Compiling. 1.1 Compilers. Source languages: Fortran, Pascal, C, etc. Target languages: another PL, machine Lang Compilers: Single-pass Multi-pass Load-and-Go Debugging Optimizing. Analysis-Synthesis Model. Compilation: Analysis & Synthesis Analysis:
E N D
Chapter 1 Introduction to Compiling Yu-Chen Kuo
1.1 Compilers Yu-Chen Kuo
Source languages: Fortran, Pascal, C, etc. • Target languages: another PL, machine Lang • Compilers: • Single-pass • Multi-pass • Load-and-Go • Debugging • Optimizing Yu-Chen Kuo
Analysis-Synthesis Model • Compilation: Analysis & Synthesis • Analysis: • Break source program into pieces • Intermediate representation • Hierarchical structure: syntax tree • Node: operation • Leaf: arguments • Synthesis: construct target program from tree Yu-Chen Kuo
Analysis-Synthesis Model Yu-Chen Kuo
Context of a Compiler • Several other programs to create .exe files • Preprocessor: macros • Assembler: translate assembly into machine code • Loader/link-editor: link library routines Yu-Chen Kuo
Context of a Compiler Yu-Chen Kuo
1.2 Analysis of the source program • Three phases • Linear analysis • Divide source program into tokens • Hierarchical analysis • Tokens grouped hierarchically • Semantic analysis • Ensure components fit meaningfully Yu-Chen Kuo
Lexical Analysis • Linear analysis: lexical analysis, scanning • e.g., position:= initial+rate*60 • Identifier position • Assignment symbol “: =“ • Identifier initial • “+” sign • Identifier rate • “*” sign • number 60 Yu-Chen Kuo
Syntax Analysis • Hierarchical analysis: parsing or syntax analysis • Group tokens into grammatical phrases • Grammatical phrases: parser tree Yu-Chen Kuo
Syntax Analysis Yu-Chen Kuo
Syntax Analysis • Hierarchical structure is expressed by recursive rules • Recursively define expression • identifier is an expression • number is an expression • expression1 +/expression2 (expression1) are an expression • By rule 1, initial and rate are exp. • By rule 2, 60 is an exp. • By rule 3, initial+rate*60 is an exp. Yu-Chen Kuo
Syntax Analysis • Recursively define statement • identifier1:= expression2 is a statement • while (expression1) do statement2 If (expression1) thenstatement2 are statements Yu-Chen Kuo
Lexical v.s. Syntax Analysis • Division is arbitrary • Recursion or not • recognize identifiers, by linear scan until neither a letter or a digital was found, no recursion • E.g., initial • Not powerful enough to analyze exp. or statement, without putting hierarchical structure • E.g, ( …..), begin …. end, statements Yu-Chen Kuo
Lexical v.s. Syntax Analysis • Division is arbitrary • Recursion or not • recognize identifiers, by linear scan until neither a letter or a digital was found, no recursion • E.g., initial • Not powerful enough to analyze exp. or statement, without putting hierarchical structure • E.g, ( …..), begin …. end, statements Yu-Chen Kuo
Semantic Analysis • Check semantic error • Gather type information for code-generation • Using hierarchical structure to identify operators and operands • Doing type checking • E.g, using a real number to index an array (error) • Type convert • E.g, Fig.1.5 ittoreal(60) if initial is a real number Yu-Chen Kuo
Semantic Analysis Yu-Chen Kuo
Analysis in Text Formatters • \hbox {<list of boxes>} • \hbox {\vbox{! 1} \vbox{@ 2}} Yu-Chen Kuo
1.3 The Phases of A Compiler Yu-Chen Kuo
1.3 The Phases of A Compiler • Phases • First three phases: analysis portion • Last three phases: synthesis portion • Symbol-table management phase • Error handler phases Yu-Chen Kuo
Symbol-table Management • To record the identifiers in source program • Identifier is detected by lexical analysis and then is stored in symbol table • To collect the attributes of identifiers (not by lexical analysis) • Storage allocation : memory address • Types • Scope (where it is valid, local or global) • Arguments (in case of procedure names) • Arguments numbers and types • Call by reference or address • Return types Yu-Chen Kuo
Symbol-table Management • Semantic analysis uses type information check the type consistence of identifiers • Code generating uses storage allocation information to generate proper relocation address code Yu-Chen Kuo
Error Detection and Reporting • Syntax and semantic analysis handle a large fraction of errors • Lexical phase: could not form any token • Syntax phase: tokens violate structure rules • Semantic phase: no meaning of operations • Add an array name and a procedure name Yu-Chen Kuo
Translation of A Statement Yu-Chen Kuo
Translation of A Statement Yu-Chen Kuo
The Analysis Phases • Lexical analysis • Group characters into tokens • Identifiers • Keywords (if, while) • Punctuations ( ‘(‘ ,’)’) • Multi-character operator (‘:=‘) • Enter lexical value (lexeme) into symbol table • position, rate, initial • Syntax analysis • Fig. 1.11(a), 1.11(b) Yu-Chen Kuo
The Analysis Phases • Syntax analysis • Semantic analysis • Type checking and converting Yu-Chen Kuo
Intermediate Code Generation • Represent the source program for an abstract machine code • Should be easy to produce • Should be easy to translate into target program • Three-address code (at most three operands) • temp2:=id3*temp1 • every memory location can act like a register • temp2 BX Yu-Chen Kuo
Code Optimization • Improve the intermediate code • Faster-running machine code • temp1 :=id3*60.0 id1:=id2+temp1 Yu-Chen Kuo
Code Generation • Generate relocation machine code or assembly code • MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1 Yu-Chen Kuo
1.4 Cousins of The Compiler • Preprocessors • Assemblers • Two-Pass Assembler • Loaders and Link-Editors Yu-Chen Kuo
Preprocessors • Macro processing • File inclusion • #include <global.h> replace by file “global.h” • Rational preprocessors • Language extensions • ## query language embedded in C • Translated into procedure call Yu-Chen Kuo
Preprocessors • Example 1.2 • \define\JACM #1; #2; #3 {{\s1 J. ACM} {\bf #1}: #2, pp. #3.} • \JACM 17;4;715-728 J. ACM17:4, pp. 715-728. Yu-Chen Kuo
Assembler • Producing relocatable machine code • DW a #10 DW b #20 MOV a, R1 ADD #2, R1 MOV R1, b • Load content of address a into R1 • Add constant 2 • Store R1 into address b Yu-Chen Kuo
Two-Pass Assembly • First pass • Find all identifiers and their storage location and store in symbol table • Identifier Address a 0 b 4 • Second pass • Translate each operation code into the sequence of bits • Relocatable machine code Yu-Chen Kuo
Two-Pass Assembly • Example 1.3 Inst. Code Register Mem/Const. Content (R) 0001(MOV) 01(R1) 00(Mem) 00000000(a) * 0011(ADD) 01(R1) 10(Constant) 00000010 0010(MOV) 01(R1) 00(Mem) 00000100(b) * Yu-Chen Kuo
Two-Pass Assembly • ‘*’ denotes relocation bit • if data is loaded starting at address 00001111 • a should be at location 00001111+00000000 • b should be at location 00001111+00000100 Inst. Code Register Mem/Const. Content (R) 0001(MOV) 01(R1) 00(Mem) 00000111(a) * 0011(ADD) 01(R1) 10(Constant) 00000010 0010(MOV) 01(R1) 00(Mem) 00010011(b) * Yu-Chen Kuo
Loaders and Link-Editors • Loader • Taking and altering relocatable address machine codes • Link-editors • External references • Library file, routines by system, any other program Yu-Chen Kuo