1.62k likes | 1.98k Views
System Software (CS 1203) Assemblers. Outline. Basic Assembler Functions (Sec. 2.1) Machine-dependent Assembler Features (Sec. 2.2) Machine-independent Assembler Features (Sec. 2.3) Assembler Design Options (Sec. 2.4). Basic Assembler Functions. Section 2.1. Introduction to Assemblers.
E N D
Outline • Basic Assembler Functions (Sec. 2.1) • Machine-dependent Assembler Features (Sec. 2.2) • Machine-independent Assembler Features (Sec. 2.3) • Assembler Design Options (Sec. 2.4)
Basic Assembler Functions Section 2.1
Introduction to Assemblers • Fundamental functions • Assign machine addresses to symbolic labels used by the programmers • Translate mnemonic operation codes to their machine language equivalents • Machine dependency • Depend heavily on the source language it translates and the machine language it produces • Ex. different machine instruction formats and codes
Role of Assemblers Source Program Assembler Object Code Linker Executable Code Loader
SIC Example Program (Fig. 2.1) • Purpose • Read records from input device (code F1) • Copy them to output device (code 05) • Repeat the above steps until encountering EOF • Write EOF to the output device • RSUB to the operating system
SIC Example Program (Fig. 2.1 )(contd.) • Consists of a main routine that reads record from input device(F1) & copies to output device(05) • Main routine calls subroutine RDREC to read a record into buffer & subroutine WRREC to write a record from buffer into output device • Each subroutine must transfer the record, one character at a time, b’coz the only I/O instructions available are RD and WD • Buffer is necessary, b’coz the I/O rates for the 2 devices, such as disk & a slow printing terminal may be different
SIC Example Program (Fig. 2.1)(contd.) • End of each record is marked by a null character (Hexa 00) • If a record is longer than the length of the of the buffer(4096 bytes), only the first 4096 bytes are copied • End of file to be copied is indicated b a zero-length record • When end-of-file is detected, the program writes EOF on the O/P device & terminates by executing a RSUB instruction
Assembler Directives(or Pseudo-Instructions) • Assembler directives • Not translated into machine instructions • Provides instructions to the assembler • Basic assembler directives • START • Specify name and starting address for the program • END • Indicate the end of the source program, and (optionally) the first executable instruction in the program
Assembler Directives (cont.) • BYTE • Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant • WORD • Generate one-word integer constant • RESB • Reserve the indicated number of bytes for a data area • RESW • Reserve the indicated number of words for a data area • BYTE & WORD – directs the assembler to generate constants • RESB & RESW – instructs the assembler to reserve memory locations without generating data values
SIC Example Program (Fig. 2.1) (cont.) Specify name and starting address for the program Main program End-of-file End-of-record Call subroutine Forward reference When End-of-file is reached Line numbers are not part of the program. They are for reference only. char
SIC Example Program (cont.) Comment line “<“ means ready End-of-record-null character Indexed addressing Hexadecimal number
SIC Example Program (cont.) Subroutine entry point Subroutine return point
SIC Example Program (cont.) • Data transfer • A record is a stream of bytes with a null character (0016) at the end • If a record is longer than 4096 bytes, only the first 4096 bytes are copied • EOF is indicated by a zero-length record (i.e., a byte stream with only a null character – hexadecimal 00) • Because the speed of the input and output devices may be different, a buffer is used to temporarily store the record • Subroutine call and return • On line 10, “STL RETADDR” is called to save the return address that is already stored in register L • Otherwise, after calling RD or WR, this COPY cannot return back to its caller
An Assembler’s Job • Convert mnemonic operation codes to their machine language codes {Eg: translate STL to 14 (line 10)} • Convert symbolic (e.g., jump labels, variable names) operands to their machine addresses {Eg: translate RETADR to 1033 (line 10)} • Use proper addressing modes and formats to build efficient machine instructions • Translate data constants in the source program into internal machine representations {Eg: translate EOF to 454F46 (line 80)} • Output the object program and provide other information (e.g., for linker and loader)
An Assembler’s Job (contd.) • All but statement 2, can be easily accomplished by sequential processing of source program, 1 line at a time • Forward Reference • Consider line 10 10 1000 FIRST STL RETADR 141033 • This contains a forward reference, (i.e.,) a reference to a label RETADR that is defined later in the program • Line 10 stores the value of L register in RETADR, but RETADR isn’t defined yet. It is defined on line 95 only. • If we attempt to translate the program line by line, we will be unable to process this statement, b’coz we don’t know the address that will be assigned to RETADR • B’coz of this most assemblers use 2 passes • 1st pass – scan source pgm for label definitions & assign addresses • 2nd pass – performs most of the actual translation
Fig. 2.1 with Object Code There is no object code corresponding to addresses 1033-2038. This storage is simply reserved by the loader for use by the program during execution.
Examples • Mnemonic code (or instruction name) opcode • Examples: STL RETADR 14 10 33 STCH BUFFER,X 54 90 39 0001 0100 0 001 0000 0011 0011 0101 0100 1 001 0000 0011 1001
Object Program Format • Header record Col. 1 H Col. 2~7 Program name Col. 8~13 Starting address of object program (hex) Col. 14~19 Length of object program in bytes (hex) • Text record Col. 1 T Col. 2~7 Starting address in this record (hex) Col. 8~9 Length of object code in this record in bytes (hex) Col. 10~69 Object code in hex (2 colums per byte of object code) • End record Col. 1 E Col. 2~7 Address of first executable instruction in object pgm(hex)
Object Program Format (contd.) Length of object pgm in bytes (207A – 1000) H^COPY ^001000^00107A T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^ 00102D T^00101E^15^0C1036^482061^081044^4C0000^454F46^000003^000000 T^002039^1E^041030^001030^E0205D^30203F^D8205D^281030^302057^549039^2C205E ^38203F T^002057^1C^101036^4C0000^F1^001000^041030^E02079^302064^509039^DC2079^2C1036 T^002073^07^382064^4C0000^05 E^001000 Hex(42/2) = 15 Hex(60/2) = 1E Hex(56/2) = 1C Hex(14/2) = 07 Length of object code in this record in bytes Hex(object code/2) = Hex(60/2) = Hex(30) = 1E
Symbolic Operands • Writing memory addresses directly in the program is inconvenient • Instead, we define variable names • Other examples of symbolic operands • Labels (for jump instructions) • Subroutines • Constants
COPY START 1000 • … • LDA LEN • … • … • LEN RESW 1 Converting Symbols to Values or Addresses • Isn’t it simply the sequential processing of the source program, one line at a time? • Not so, if there are forward references – the value of the symbol is unknown now, because it is defined later in the code Forward reference: reference to a label that is defined later in the program
Two-Pass Assemblers • Pass 1 • Assign addresses to all statements in the program • Save the values (addresses) assigned to all labels (including label and variable names) for use in Pass 2 (deal with forward references) • Perform some processing of assembler directives (e.g., BYTE, RESW these can affect address assignment) • Pass 2 • Assemble instructions (generate opcode and look up addresses) • Generate data values defined by BYTE, WORD • Perform processing of assembler directives not done in Pass 1 • Write the object program and the assembly listing
Two-Pass Assembler (cont.) • From input line: LABEL, OPCODE, OPERAND • Operation Code Table (OPTAB) • Symbol Table (SYMTAB) • Location Counter (LOCCTR) • The information in OPTAB is predefined, when the assembler itself is written
Two-Pass Assembler (cont.) Source program Intermediate file Object code Pass 1 Pass 2 OPTAB SYMTAB SYMTAB • OPTAB looks up mnemonic opcodes & translates them to their machine language equivalents • SYMTAB stores values (addresses) assigned to labels
Operation Code Table (OPTAB) • In pass 1, OPTAB is used to look up and validate mnemonic opcodes in the source program • In pass 2, OPTAB is used to translate mnemonic opcodes to machine instructions • In SIC both passes could be done in either pass 1 or pass2 • However for SIC/XE, having instructions of different length we use both pass 1 & pass 2 • Search OPTAB in pass1 to find instruction length for incrementing LOCCTR • In pass2, tell which instruction format to use to assemble the instruction
Operation Code Table (OPTAB) (contd.) • Content • The mapping between mnemonic and machine code. Also include the instruction format, available addressing modes, and length information • Characteristic • Static table • The content will never change • Contents are not normally added/deleted (predefined) • Implementation • Array or hash table, easy for search • Gives optimum performance for the particular set of keys being stored
COPY 1000 FIRST 1000 CLOOP 1003 ENDFIL 1015 EOF 1024 THREE 102D ZERO 1030 RETADR 1033 LENGTH 1036 BUFFER 1039 RDREC 2039 WRREC 2061 Symbol Table (SYMTAB) • Content • Include the label name and value (address) for each label in the source program • Include data type and length information • With flags to indicate errors (e.g., a symbol defined in two places) • In pass1, labels are entered into SYMTAB as they are encountered in the source program, along with assigned addresses from LOCCTR • In pass2, symbols used as operands are looked up in SYMTAB to obtain the addresses to be inserted in the assembled instructions
Symbol Table (SYMTAB) (contd.) • Characteristic • Dynamic table (i.e., symbols may be inserted, deleted, or searched in the table) • Implementation • Hash table can be used to speed up search • Organized generally as hash table, for efficiency of insertion & retrieval • Because variable names may be very similar (e.g., LOOP1, LOOP2), the selected hash function must perform well with such non-random keys
Location Counter (LOCCTR) • This variable is used to help in the assignment of addresses • It is initialized to the beginning address specified in the START statement • After each source statement is processed, the length of the assembled instruction or data area to be generated is added to LOCCTR • When a label in the source program is reached, the current value of LOCCTR gives the address associated with that label
Pseudo Code for Pass 1 (SIC) • 1st find starting address of the program • START – its operand will be the starting address
Pseudo Code for Pass 1 (contd.) • Whenever we find a label, save it in the symbol table • Set the error flag if an unrecognized opcode is found OR if a symbol is encountered more than 1 time
Pseudo Code for Pass 2 (SIC) • Write the HEADER
Assembler Design • Machine Dependent Assembler Features (Sec. 2.2) • instruction formats and addressing modes • program relocation • Machine Independent Assembler Features (Sec. 2.3) • literals • symbol-defining statements • expressions • program blocks • control sections and program linking
Machine-dependent Assembler Features Section 2.2
SIC/XE Assemblers • What’s new for SIC/XE? • more addressing modes • program relocation
Differences Between the SIC and SIC/XE Programs • Register-to-register instructions are used to improve execution speed • Fetching a value stored in a register is much faster than fetching it from the memory, b’coz they are shorter & don't require another memory reference • In line 150, COMP ZERO is changed to COMPR A,s • II’ly in line 165, TIX MAXLEN is changed to TIXR T • Immediate addressing mode is used whenever possible • Operand is already included in the fetched instruction. There is no need to fetch the operand from the memory • Denoted by prefix #
Differences Between the SIC and SIC/XE Programs(contd.) • Indirect addressing mode is used whenever possible • Just one instruction rather than two is enough • Denoted by the prefix @ • Instructions referring memory are normally assembled by PC relative or base relative mode • If displacements for both PC relative & base relative mode are too large to fit into a 3-byte instruction, then 4-byte extended format is used • Denoted by the prefix +
Differences Between the SIC and SIC/XE Programs(contd.) • Larger main memory of SIC/XE means, has more room to load & run several programs at the same time • This kind of sharing of the machine between programs is called multiprogramming • Results in more productive use of hardware • To take full advantage of this feature, we must be able to load programs into memory wherever there is room, rather than specifying a fixed address • This introduces the idea of relocation
Instruction Formats and Addressing Modes • SIC/XE • PC-relative or Base-relative addressing: op m • Indirect addressing: op @m • Immediate addressing: op #c • Extended format (4 Bytes): +op m • Index addressing: op m,x • register-to-register instructions • larger memory -> multi-programming (program allocation)
Relative Addressing Modes • PC-relative or base-relative addressing mode is preferred over direct addressing mode. • Save one byte from using format 3 rather than format 4 • Reduce program storage space • Reduce program instruction fetch time • Relocation will be easier
An SIC/XE Program (fig 2.5) For relocation Format 4 Immediate addressing Indirect addressing