630 likes | 842 Views
系統程式. Chapter 2 : Assemblers. Expanded Assembly Programs. Object Programs. Linkers. Libraries. Loaders. Executables. Assembling Programs for Execution. Assembly Programs. Macro Processors. Core Assemblers. Assemblers. Input and output of an assembler:
E N D
系統程式 Chapter 2: Assemblers
Expanded Assembly Programs Object Programs Linkers Libraries Loaders Executables Assembling Programs for Execution Assembly Programs Macro Processors Core Assemblers
Assemblers • Input and output of an assembler: • Assembly program for programmers (pp. 45) • Object code for hardware (pp. 49) • Types of “instructions” in a assembly program: • Mnemonic machine instructions • Assembler directives (pp. 44) • Major functions of assemblers (pp. 46)
Compiler Directives Main() { …… x = x + y; #ifdef DEBUG printf(“%d”,x); #endif ……
Major Functions of Assemblers • Conversion of mnemonic op-codes • Conversion of machine instruction • Creation of proper object file • Conversion of data constant • Save results in the object program
A Simple 80X86 ASM Program ; This program displays “Hello, world!” .model small .stack 100h .data message db “Hello, world!”, 0dh, 0ah, ‘$’ .code main proc mov ax, @data mov ds, ax mov ah, 9 mov dx, offset message int 21h mov ax, 4c00h int 21h main endp end main
Some Details • Forward reference • Assembler directives (pseudo-instructions) • Format of object program (pp. 48-49)
A 2-Pass Assembler • Pass 1 • Assign addresses to statements • Record addresses for labels • Process assembler directives • Pass 2 • Assemble instructions • Generate data values • Finish processing of assembler directives • Write output files
STL RETADR 1000 JUSB RDREC 1003 LDA LENGTH 1006 COMP ZERO 1009 JEQ ENDFIL 100C JSUB WRREC 100F J CLOOP 1012 LDA EOF 1015 The First Pass… SYMBOL TABLE (SYMTAB) COPY = 1000 FIRST = 1000 CLOOP = 1003 ENDFIL = 1015
Translating to SIC instructions • During the first pass, the assembler collects addresses for symbols • Given this information, we can translate assembly instructions to their machine code counterpart. • Example • STL RETADR (line 10) • STL: opcode = 14 • RETADR: 1033 • This instruction does not use index mode. (why?) • According to the instruction format of SIC, the machine code for this instruction is 141033. (ref. pp6)
More Examples for SIC • Examples • JSUB RDREC (line 15) • JSUB: OPCODE = 48 • RDREC: 2039 • NOT INDEXED MODE • MACHINE CODE 482039 • STCH BUFFER,X (line 160) • STCH: OPCODE=54 (0101 0100) • BUFFER: 1039(001 0000 0011 1001) • INDEXED MODE (BECAUSE OF “,X”): x bit =1 • MACHINE CODE 0101 0100 1001 0000 0011 1001 (549039) • LDCH BUFFER,X
Object Program • Why do we need to specify and standardize the format? • For communication between assemblers and linkers/loaders • For linking object programs created by different assemblers/compilers • The format • Header record • Text record • End record
Assembly Programs Object Programs Assemblers C/C++ ... Programs Object Programs Compilers Standardizing Object Programs Linkers Loaders Libraries
Standardizing Object Programs (2) %ps –u sp10 USER PID %CPU %MEM sp10 xxx 98% 50% … sp10 yyy 2% 1%
Object Program (2) • Reasons for having multiple Text records in the object program • Line width limitation (The author arbitrarily chooses 1E bytes.) • End of object code • Reserving memory space • Putting the object code for a source statement in one text record • Program blocks (pp. 83, Section 2.3.4)
Figure 2.3 H,COPY ,001000,00107A T,001000,1E,141033,482039,001036,……,00102D T,00101E,15,0C1036,482061,……,000000 T,002039,1E,041030,001030,…… T,002057,1C,101036,4C0000,F1,……,2C1036 T,002073,07,382064,4C0000,05 E,001000
Algorithm and Data Structure • Operation table (OPTAB) • Symbol table (SYMTAB) • Hash tables for OPTAB and SYMTAB • Location counter (LOCCTR) • Intermediate files • Skeleton of the algorithm (Figure 2.4)
Assembler for SIC/XE • Continue to use the 2-pass approach • Need to enhance the assembler for SIC • more registers • more complex instruction set (addressing modes): @, +, #, BASE • larger available memory • Improved efficiency due to better architecture. • Multiprogramming and multiprocessing
Assembler for SIC/XE (2) • The author arbitrarily chooses to use pc relative before base relative. (second paragraph on page 59) • Examples: (pp. 57-61) • format 4 (pp. 57) • pc relative (pp. 59) • base relative (pp. 60): BASE, NOBASE • others
Format 1 Simply find the opcode of the instruction from the OPTAB Format 2 Find the opcode of the instruction Convert register names to register numbers Example COMPR A, S (line 150, pp 55) Translating Format 1/2 Instructions AO 0 4 8 bits
Translating Format 3 Instructions • Steps: • Determine the format (+ for format 4), and set e bit. • Find and set the opcode • Determine if indexed mode (“,X” for indexed), and set x bit • Determine the addressing mode (@ for indirect and # for immediate), and set bits n and i accordingly • If immediate mode: If a constant follows the # sign, set b=0 and p=0, set disp to the constant, and then process the next instruction. Otherwise, carry the translation of the symbol that follows # with the methods described in the following step. • Try program-counter relative addressing. If feasible, set b=0 ,p=1, and disp. Otherwise, try base relative addressing. If feasible, set b=1, p=0, and disp. Report error if neither succeeds.
Translating Format 3 Instructions • Example: STL RETADR (line 10, pp. 58) • Format 3 e bit = 0 • Opcode = 14 00010100 • Not indexed x bit = 0 • Neither indirect nor immediate n = i = 1 • Not immediate • Try pc-relative • RETADR = 0030 (TA) • (PC) = 0003 (Howdo we know???) • TA = (PC) + disp disp = TA-(PC)=02D 0000 0010 1101 • -2048 02D(45) 2047 b = 0 and p = 1 • STL RETADR 000101 1 1 0 0 1 00000 0010 1101 17202D 00010111 0 0 1 0 ????????????
Translating Format 3 Instructions(2) • Example: J CLOOP (line 40, pp. 58) • Format 3 e bit = 0 • Opcode = 3C 00111100 • Not indexed x bit = 0 • Neither indirect nor immediate n = i = 1 • Not immediate • Try pc-relative • CLOOP = 0006 (TA) • (PC) = 001A • TA = (PC) + disp disp = TA-(PC)=FEC 1111 1110 1100 • -2048 FEC(-14h) 2047 is good b = 0 and p = 1 • STL RETADR 001111 1 1 0 0 1 01111 1110 1100 3F2FEC 001111 1 1 0 0 1 0 ???? ???? ????
Translating Format 3 Instructions(3) • Example: STCH BUFFER,X (line 160, pp. 58) • Format 3 e bit = 0 • Opcode = 54 01010100 • Indexed x bit = 1 • Neither indirect nor immediate n = i = 1 • Not immediate • Try pc-relative • BUFFER = 0036 (TA) • (PC) = 1051 • TA = (PC) + disp disp = TA-(PC)=-(101Bh) • -(101Bh) [-2048, 2047] cannot use pc-relative addressing • Must resort to based relative addressing 010101 1 1 1 0 1 0 ???? ???? ????
Translating Format 3 Instructions(4) • Example: STCH BUFFER,X (line 160, pp. 58) • Format 3 e bit = 0 • Opcode = 54 01010100 • Indexed x bit = 1 • Neither indirect nor immediate n = i = 1 • Not immediate • We try base relative addressing this time. • BUFFER = 0036 (TA) • (B) = 0033 (How do we know???) • TA = (B) + disp disp = TA-(B)=003 0000 0000 0011 • 003 [0, 4095] set b=1 and p=0 • STCH BUFFER, X 010101 1111000000 0000 0011 57C003 010101 1 1 1 1 0 0 ???? ???? ????
Translating Format 3 Instructions(5) • Example: LDA #3 (line 55, pp. 58) • Format 3 e bit = 0 • Opcode = 00 00000000 • Not indexed x bit = 0 • Immediate n = 0 and i = 1 • Since it is immediate mode, set b= 0, p = 0, and disp = operand = 3 • LDA #3 000000 0 1 0 0 0 00000 0000 0011 0 1 0 0 0 3
Translating Format 3 Instructions(6) • Example: LDB #LENGTH (line 12, pp. 58) • Format 3 e bit = 0 • Opcode = 68 01101000 • Not indexed x bit = 0 • Immediate n = 0 and i = 1 • Since it is immediate mode, but LENGTH is not a constant • Try pc-relative • TA = LENGTH = 0033 • (PC) = 0006 • TA = (PC) + disp disp = TA – (PC) = 02D • 02D [-2048, 2047] set b=0 and p=1 • LDB #LENGTH 011010 0 1 0 0 1 00000 0010 1101 69202D
Translating Format 3 Instructions(7) • Example: J @RETADR (line 70, pp. 58) • Format 3 e bit = 0 • Opcode = 3C 00111100 • Not indexed x bit = 0 • Indirect n = 1 and i = 0 • Try pc-relative • TA = RETADR = 0030 • (PC) = 002D • TA = (PC) + disp disp = TA – (PC) = 003 • 003 [-2048, 2047] set b=0 and p=1 • J @RETADR 001111 1 0 0 0 1 00000 0000 0011 3E2003
Translating Format 4 Instructions • Example: +JSUB RDREC (line 15, pp. 58) • Format 4 e bit = 1 • Opcode = 48 01001000 • Not indexed x bit = 0 • Neither indirect nor immediate n = i = 1 • Not immediate • (Twenty bits are sufficient to record any SIC/XE address, so there is no need to employ displacement.) b=p=0, address = RDREC = 01036 • +JSUB RETADR 010010 1 1 0 0 0 10000 0001 0000 0011 0110 4 B 1 01036
Translating Format 4 Instructions(2) • Example: +LDT #4096 (line 12, pp. 58) • Format 4 e bit = 1 • Opcode = 74 01110100 • Not indexed x bit = 0 • Immediate n = 0 and i = 1 • Immediate mode and a constant following # • Set b=0 and p = 0 • 4096 = 01000h • LDT #4096 011101 0 1 0 0 0 10000 0001 0000 0000 0000 7 5 1 01000
Macro Processors Expanded C/C++ Programs Compilation Programs Assembly Programs Assemblers Object Programs Linkers Libraries Loaders Executables In reality, compilers may not use assembly programs for this intermediate step. This step may be skipped in some rare cases. Compiling Programs for Execution C/C++ Programs
Macro Processors Expanded Assembly Programs Core Assemblers Object Programs Linkers Libraries Loaders Notice that a complete assembler may include a macro processor. Executables This step may be skipped in some rare cases. Assembling Programs for Execution Assembly Programs
Program Relocation • Why bother? • Do we really know where our programs will be placed in memory beforehand? • Efficient memory management • Multiprogramming and multiprocessing • Load time • Challenges (pp. 62-63): relative and absolute • The algorithm: mark memory reference that needs to be modified before run time (pp. 63)
Figure 2.8 H,COPY ,000000,001077 T,000000,1D,17202D,69202D,4B101036,032036, 290000,332007,4B10105D,…… …… M,000007,05 M,000014,05 M,000027,05 E,001000
? ? Multiprogramming
Relocation (2) • Examples • Figure 2.7 • The third paragraph counted from the bottom of page 62. • Modification records • Page 64 • Half bytes
Literals • What? Examples: page 66 (=) • Difference between literals and immediate constants (size) • Why? Allow programmers to avoid defining constants in their programs.
Literals (2) • How? Literal pools • Normally put at the end of the program • LTORG (used when it is desirable to keep the literal operand close to the instruction that uses it) • Duplicate literals (pros and cons) • Literal for location counter: * (pp. 70) • BASE * • LDB =* • Taking care of literals (pp. 70-71)
45 001A ENDFIL LDA =C’EOF’ 70 002A J @RETADR 93 LTORG 215 1062 WLOOP TD =X’05’ 255 END
032010 ______ 454F46 E32011 05 45 001A ENDFIL LDA =C’EOF’ 70 002A J @RETADR 93 LTORG 002D * =C’EOF’ 215 1062 WLOOP TD =X’05’ 255 END 1076 * =X’05’
We define symbols with the #define command in C. #define in A Simple C Program #define ABC 5 main() { int x, y, z; x = 3; y = ABC + x; z = 3*x+ABC; printf(“%d %d %d\n”, x, y, z); return; }
Symbol-Defining Statements • Symbol-defining statements • What? (pp. 71) • Similar to #define command for C preprocessor • symbol EQU value • Why? Easy to maintain programs… • program parameters • registers (pp. 72) • ORG (origin) • ORG value (indirectly assign value to symbol)
NO forward reference in EQU and ORG statements (pp. 74-75) • Why not?! Our 2-pass assembler requires ALL symbols be defined during Pass 1! • Discuss examples on page 75
Expressions • What? • 107 MAXLEN EQU BUFEND-BUFFER • Validity checking • relative and absolute expressions • Absolute exp may contains paired relative terms that have opposite signs • A relative expression is one in which all relative terms, except one with positive sign, are paired. • No relative terms may enter into multiplication and division. • Discuss examples on page 77.
Program blocks • Program blocks and control sections • internal identity vs. external identity • What? pp. 81 • Separate the spatial relationship between source code and their object code • Why? • program readability • May reduce the usage of format 4 instructions • How? • USE • The assembler will rearrange the segments…
Program blocks (2) • How? (Cont’d) • Need to maintain a separate table for data about blocks • Calculation of address (page 82) • Why again! Program readability and reducing complexity in assembled code (avoid extended format) • Discuss Figure 2.13 for code generation (no need to rewrite code generation; the loader is supposed to put things in their places)
RETADR ?????? BUFFER LENGTH ?????? INPUT ?? 0066 Memory Appropriation CDATA CBLKS default STL JSUB LDA …… EOF CLEAR 05 CLEAR ……
Control Sections • What? • Each control section can be loaded and relocated independently. • Linking is required (to resolve external references) • EXTDEF, EXTREF • Why? flexibility • How? • Leave the longest space for external references • A relative C example
#include <stdio.h> extern int outside(int x); int inside(int y); main() { int i=3; printf("%d",outside (i)); return 0; } int inside(int y) { return 2*y; } extern int inside(int z); int outside (int x) { return (x + inside(x)); } A C Example gcc –c p1.c gcc –c p2.c gcc –o go p1.o p2.o go ???