350 likes | 475 Views
Chapter 10. The Assembly Process Basically why does it all work. Hierarchy of Programming . Problem Statement ? ? ? ? ? ?. The Assembly Process . A computer understands machine code People (and compilers) write assembly language
E N D
Chapter 10 The Assembly Process Basically why does it all work
Hierarchy of Programming • Problem Statement • ? • ? • ? • ? • ? • ?
The Assembly Process • A computer understands machine code • People (and compilers) write assembly language • An assembler is a program that translates each instructionto its binary machine code equivalent. • It is relatively simple program • A one-to-one or near one-to-one correspondencebetween assembly language instructions and machinelanguage instructions. • Assemblers now do some code manipulation • Like MAL to TAL • Label resolution • A macro assembler can process simple macros likecpp Assemblysource code assembler Machine code
MAL TAL • MAL is the set of instructions accepted by the assembler. • TAL is a subset of MAL – the instructions that can be directlyturned into machine code. • There are many MAL instructions that have no single TALequivalent. • To determine whether an instruction is a TAL instructionor not: • Look in appendix C. • The assembler takes (non MIPS) MAL instructions andsynthesizes them into 1 or more MIPS instructions.
MAL TAL mul $8, $17, $20 • Becomes • MIPS has 2 registers for results from integer multiplicationand division: HI and LO • Each is a 32 bit register • mult and multu places the least significant 32 bits of itsresult into LO, and the most significant into HI. • Multiplying two 32-bit numbers gives a 64-bit result • (232 – 1)(232 – 1) = 264 – 2x232 - 1 mult $17, $20mflo $8
MAL TAL mflo, mtlo, mfhi, mthi register lofrom move register hito move • Data is moved into or out of register HI or LO • One operand is needed to tell where the data is comingfrom or going to. • For division (div or divu) • HI gets the dividend • LO gets the remainder • Why aren’t there in $0-$31?
MAL TAL TAL has only base displacement addressing lw $8, label Becomes: la $7, label lw $8, 0($7) Which becomes lui $8, 0xMSPART of label ori $8, $8, 0xLSpart of label lw $8, 0($8)
MAL TAL • Instructions with immediates are synthesized with otherinstructions • add $sp, $sp, 4 • Becomes: • addi $sp, $sp, 4 • add requires 3 operands in registers. • add $12, $18 add $12, $12, $18 • addi has one operand that is immediate. • On the MIPS immediate instructions include: • addi, addiu, andi, lui, ori, xori • Why not more?
MAL TAL TAL implementation of I/O instructions: putc $18 Becomes li $2, 11 move $4, $18 syscall Or addi $2, $0, 11 add $4, $18, $0 syscall
MAL TAL getc $11 Becomes: li $2, 12 syscall move $11, $2 puts $13 Becomes: li $2, 4 move $4, $13 syscall done Becomes: li $2, 10 syscall
Assembly • The assembler will • Assign addresses • Generate machine code • If necessary, the assembler will • Translate (synthesize) from the accepted assemblyto the instructions available in the architecture • Provide macros and other features • Generate an image of what memory must look like forthe program to be executed.
Assembly • A 2-pass Assembler will • Create complete symbol table, which is just a listof the labels (symbols) together with the addressesassigned to each label by the assembler. • Complete machine code for instructions that didn’t getfinished in pass 1.
Assembler • What should the assembler do when it sees a directive? • .data • .text • .space, .word, .byte • org (HC11) • equ (HC11) • How is the memory image formed?
Assembler • Example Data Declaration • Assembler aligns data to word addresses unless told not to. • Assembly is very sequential. .dataa1: .word 3 a2: .byte ‘\n’ a3: .space 5 Address Contents 0x00001000 0x00000003 0x00001004 0x??????0a 0x00001008 0x???????? 0x0000100c 0x????????
Assembler • Machine code generation from simple instructions: • Opcode is 6 bits – addi is defined to be 001000 • Rs is 5 bits, encoding of 20, 10100 • Rt is 5 bits, encoding of 8, 01000 • The 32-bit instruction for addi $8, $20, 15 is: • 001000 10100 01000 0000000000001111 • Or • 0x2288000f Assembly language: addi $8, $20, 15Machine code format: opcode immediate rt rs 31 0 opcode rs rt immediate
Instruction Formats • I-Type Instructions with 16-bit immediates • ADDI, ORI, ANDI • LW, SW • BNE OPC:6 rs1:5 rd:5 immediate:16 OPC:6 rs1:5 rs2/rd displacement:16 OPC:6 rs1:5 rs2:5 distance(instr):16
Instruction Formats • J-Type Instructions with 26-bit immediate • J, JAL • R-Type All other instructions • ADD, AND, OR, JR, JALR, SYSCALL, MULT, MFHI,LUI, SLT OPC:6 26-bits of jump address OPC:6 rs1:5 rs2:5 rd:5 ALU function:11
Assembly Example .data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
Assembly Example Symbol Table Symbol address a1 0040 0000 a2 0040 0004 a3 0040 0014 __start 0080 0000 loop 0080 0008 Memory map of data section
Assembly Example Translation to TAL code .text__start: lui $6, 0x0040 # la $6, a2 ori $6, $6, 0x0004 loop: lw $7, 4($6) mult $9, $10 beq $0, $0, loop # b loop ori $2, $0, 10 # done syscall Memory map of data section
Assembly • Branch offset computation. • At execution time: PC NPC + {sign extended offset field,00} • PC points to instruction after the beq when offsetis added. • At assembly time: • Byte offset = target addr – (address of branch + 4) • = 00800008 – (00000004 + 00800010) (hex) • = FFFFFFF4 (-12) • 3 important observations: • Offset is stored in the instruction as a word offset • An offset may be negative • The field dedicated to the offset is 16 bits, range isthus limited.
Assembly • Jump target computation. • At execution time: • PC {most significant 4 bits of PC, target field, 00} • At assembly time • Take 32 bit target address • Eliminate least significant 2 bits (since word aligned) • Eliminate most significant 4 bits • What remains is 26 bits, and goes in the target field
Linking and Loading Object file header start/size of other parts text Machine Language data static data – size and initial values relocation info instructions and data with absolute addresses symbol table addresses of external labels Debuggin` info
Linking and Loading • Linker • Search libraries • Read object files • Relocate code/data • Resolve external references • Loader • Create address spaces for text & data • Copy text & data in memory • Initialize stack and copy args • Initialize regs (maybe) • Initialize other things (OS) • Jump to startup routine • And then address of __start
Linking and Loading • The data section starts at 0x00400000 for the MIPS RISC processor. • If the source code has, • .data • a1: .word 15 • a2: .word –2 • then the assembler specifies initial configuration memory as • address contents • 0x00400000 0000 0000 0000 0000 0000 0000 0000 1111 • 0x00400004 1111 1111 1111 1111 1111 1111 1111 1110 • Like the data, the code needs to be placed starting at a specificlocation to make it work
Linking and Loading • Consider the case where the assembly language code issplit across 2 files. Each is assembled separately. • File 1: File2: • .data • a3: .word 0 • .text • proc5: lw $t6, a1 • sub $t2, $t0, $s4 • jr $ra • .data • a1: .word 15 • a2: .word –2 • .text • __start: la $t0, a1 • add $t1, $t0, $s3 • jal proc5 • done
Linking and Loading • What happens to… • a1 • a3 • __start • proc5 • lw • la • jal
Linking and Loading • Problem: there are absolute addresses in the machinecode. • Solutions: • Only allow a single source file • Why not? • Allow linking and loading to • Relocate pieces of data and code sections • Finish the machine code where symbols were leftundefined • Basically makes absolute address a relative address
Linking and Loading • The assembler will • Start both data and code sections at address 0, forall files. • Keep track of the size of every data and code section. • Keep track of all absolute addresses within the file. • Linking and loading will: • Assign starting addresses for all data and code sections,based on their sizes. • The blocks of data and code go at non-overlappinglocations. • Fix all absolute addresses in the code • Place the linked code and data in memory at the location assigned • Start it up
MIPS Example Code levels of abstraction (from James Larus) C code #include <stdio.h> int main (int argc, char *argv[]) { int I; int sum = 0; for (I=0; I<=100; I++) sum += I * I; printf (“The sum 0..100=%d\n”,sum); } Compile this HLL into a machine’s assembly language with thecompiler.
MIPS Example .text .align 2 .globl main .ent main 2 main: subu $sp, 32 sw $31, 20($sp) sd $4, 32($sp) sw $0, 24($sp) sw $0, 28($sp) loop: lw $14, 28($sp) mul $15, $14, $14 lw $24, 24($sp) addu $25, $24, $15 sw $8, 28($sp) ble $8, 100, loop la $4, str lw $5, 24($sp) jal printf move $2, $0 lw $31, 20($sp) addu $sp, 32 j $31 .end main .data .align 0 str: .asciiz “The sum 0..100=%d\n”
MIPS Assembly Language, labels resolved addiu sp, sp,-32 sw ra, 20(sp) sw a0, 32(sp) sw a1, 36(sp) sw zero, 24(sp) sw zero, 28(sp) lw t6, 28(sp) lw t8, 24(sp) multu t6, t6 addiu t0, t6, 1 slti at, t0, 101 sw t0, 28(sp) mflo t7 addu t9, t8, t7 bne at, zero, -9 sw t9, 24(sp) lui a0,4096 lw a1, 24(sp) jal 1048812 addiu a0, a0, 1072 lw ra, 20(sp) addiu sp, sp, 32 jr ra move v0, zero Which then the assembler translates into binary machine codefor instructions and data.
MIPS Machine language 00100111101111011111111111100000 10101111101111110000000000010100 10101111101001000000000000100000 10101111101001010000000000100100 10101111101000000000000000011000 10101111101000000000000000011100 10001111101011100000000000011100 10001111101110000000000000011000 00000001110011100000000000011001 00100101110010000000000000000001 00101001000000010000000001100101 10101111101010000000000000011100 00000000000000000111100000010010 00000011000011111100100000100001 00010100001000001111111111110111 10101111101110010000000000011000 00111100000001000001000000000000 10001111101001010000000000011000 00001100000100000000000011101100 00100100100001000000010000110000 10001111101111110000000000010100 00100111101111010000000000100000 00000011111000000000000000001000 00000000000000000001000000100001