660 likes | 755 Views
assembly language source code. assembler. "machine code". To execute a program: 1 put the "machine code" into memory 2 jal __start (the OS does this). memory. "machine code". assembler's task. assign addresses generate "machine code"
E N D
assemblylanguagesourcecode assembler "machine code"
To execute a program: 1 put the "machine code" into memory 2 jal __start (the OS does this) memory "machine code"
assembler's task • assign addresses • generate "machine code" • (architecture dependent)further translation of assembly language source code
previous architectures 1 assemblylanguageinstruction 1 machinecodeinstruction MIPS architecture 1 assemblylanguageinstruction 1 or moremachine codeinstructions
This further translation is also called synthesis. MIPS example of synthesis: add $8, $9, -16 becomes addi $8, $9, -16
Two operands in the source code add $8, $9 are expanded back out to become add $8, $8, $9
integer multiplication and division each produce 2 32-bit results integer division produces • quotient • remainder integer multiplication of 2 32-bit operands produces a 64-bit result
MIPS hardware implements 2 extra registers (called HI and LO) to hold these results. Here are 4 more MIPS instructions: mflo R mtlo R mfhi R mthi R m move lo register LO f from hi register HI t to
multiplication mul $8, $9, $10 becomes mult $9, $10 mflo $8 X HI LO
division div $8, $9, $10 becomes div $9, $10 mflo $8 # quotient in LO rem $12, $13, $14 becomes div $13, $14 mfhi $12 # remainder in HI
puts, putc, getc, and done are not TAL ! I/O is accomplished by requesting service from the operating system (OS). All architectures do this with a single instruction. On MIPS, this instruction is syscall (no operands)
(note that this is specific to our simulator) To help the OS distinguish what service is required, $v0 ($2) is set:
synthesis of puts $8
lw $8, X becomes la $8, X lw $8, 0($8) Oops! la must be synthesized.
synthesis of la $8, my_label • requires the address assigned for my_label • every address is assigned by the assembler MS part LS part 16 16 32
la $8, my_label becomes lui $8, 0xMS part ori $8, $8, 0xLS part
after lui $8, 0xMS part $8 MS part 000 . . . 0 this is then logically ORed with LS part 000 . . . 0 due to the instruction ori $8, $8, 0xLS part resulting contents of $8: $8 MS part LS part
Synthesize lw $8, X Assume X is assigned address 0xaaee0018. first try: la $8, X lw $8, 0($8) with synthesis of the la instruction: lui $8, 0xaaee ori $8, $8, 0x0018 lw $8, 0($8)
Synthesize sb $12, X Assume X is assigned address 0x080001a0.
Generate machine code for addi $8, $20, 15 From the TAL table: addi Rt, Rs, I Rtis $8 Rsis $20 I is 0000 0000 0000 1111 0010 00ss ssst tttt ii .. ii sssss is 10100 (for $20) ttttt is 01000 (for $8) 0010 0010 1000 1000 0000 0000 0000 1111 in hex 0x2288000f op code 16 bits
Generate machine code for lw $8, 12($sp) lw Rt, I(Rb) Rtis $8 Rbis $sp (which is $29) I is 12 1000 11bb bbbt tttt ii .. ii bbbbb is 11101 ttttt is 01000 1000 1111 1010 1000 0000 0000 0000 1100 in hex 0x8fa8000c op code 16 value 12
assemblylanguagesourcecode assembler assign addresses produce machine code memory image
Problem: forward references .text beq $8, $11, later_in_code later_in_code: lw $20, X .data X: .word 16
Simple solution: 2-pass assembler • first pass: • (MIPS-only) MAL TAL synthesis • assign all addresses • second pass: • produce all machine code More complex and more efficient: 1-pass assembler • Keep a list of instructions that cannot be completed due to yet-to-be-assigned addresses. As addresses are assigned, check the list and complete instructions.
assign all addresses(and remember them) implies the use of a table holding the mapping of addresses to labels called a symbol table
As the assembler works on the source code, it scans the characters in the file. • Scanner (a SW module) • breaks a set of characters into significant groups known as tokens • often, tokens are separated by white space or special punctuation .data a1: .word 3 loop: lw $7, 4($6)
.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
2 segments: code and data • The assembler places items into these 2 segments. So, it needs addresses. • Use starting addresses of data 0x0040 0000 code 0x0080 0000 • The variable internal to the assembler that represents the next address to be assigned is the location counter.
TAL equivalent of code: .text __start: lui $6, 0x0040 # la $6, a2 ori $6, $6, 0x0004 loop: lw $7, 4($6) mult $9, $10 beq $0, $0, loop # b loop ori $2, $0, 10 # done syscall
As a result of processing the entire .data section, the memory image will be addresscontents 0x0040 0000 0x0000 0003 0x0040 0004 0x0000 0010 0x0040 0008 0x0000 0010 0x0040 000c 0x0000 0010 0x0040 0010 0x0000 0010 0x0040 0014 0x0000 0005
.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
(1) Machine code for la $6, a2 Synthesized:lui $6, 0x0040 (address from symbol table) ori $6, $6, 0x0004 lui Rt, I Rtis $6 0011 1100 000t tttt ii .. ii ttttt is 00110 0011 1100 0000 0110 0000 0000 0100 0000 in hex 0x3c060040 (2) (1) op code 16
Add to the memory image addresscontents 0x0080 0000 0x3c06 0040
Machine code for ori $6, $6, 0x0004 ori Rt, Rs, I Rtis $6 Rsis $6 0011 01ss ssst tttt ii .. ii ttttt is 00110 sssss is 00110 0011 0100 1100 0110 0000 0000 0000 0100 in hex 0x34c60004 (2) op code 16
Add it to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 0x0080 0004 0x34c6 0004
.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
Scanning on, machine code for lw $7, 4($6) lw Rt, I(Rb) Rtis $7 Rbis $6 I is 4 1000 11bb bbbt tttt ii .. ii 1000 1100 1100 0111 0000 0000 0000 0100 in hex 0x8cc70004 op code 16
Add it to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 (lui) 0x0080 0004 0x34c6 0004 (ori) 0x0080 0008 0x8cc7 0004 (lw)
.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
Rs Rt next comes mult $9, $10 0000 00ss ssst tttt 0000 0000 0001 1000 0000 0001 0010 1010 0000 0000 0001 1000 in hex 0x012a0018 01001 01010 Rd op code
op code 000000 is used for any arithmetic or logical instruction with 3 register operands 0000 00ss ssst tttt dddd d??? ???? ???? whichoperation
Add mult to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 (lui) 0x0080 0004 0x34c6 0004 (ori) 0x0080 0008 0x8cc7 0004 (lw) 0x0080 000c 0x012a 0018 (mult)
.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done
b loop is a pseudoinstruction (must be synthesized) Many translations: beq $0, $0, loop bgez $0, loop blez $0, loop j loop
beq $0, $0, loop 0001 00ss ssst tttt iii ... ii Rs Rt 00000 00000 op code I I is a derivation of an offset.
At run (execution) time, for a taken branch I (from instruction) I || 00 (concatenate) I || 00 (sign extend to 32 bits) + PC PC
bytedifference I computed by the assembler relies on except, when the PC (the branch address!) is used (at execution time), the PC update step (of the fetch and execute cycle) has already been completed. So, targetaddress branchaddress - = bytedifference targetaddress branchaddress 4 + - =
from the symbol table target is loop 0x0080 0008 beq is at 0x0080 0010 byte offset = 0x00800008 – ( 0x00800010 + 4 ) 0000 0000 1000 0000 0000 0000 0000 1000 - 0000 0000 1000 0000 0000 0000 0001 0100 (can't do this in unsigned, so convert to 2's complement) 1111 1111 0111 1111 1111 1111 1110 1100 additive inverse of
0000 0000 1000 0000 0000 0000 0000 1000 + 1111 1111 0111 1111 1111 1111 1110 1100 1111 1111 1111 1111 1111 1111 1111 0100 this represents -12 -12 is the byte offset to be added to the PC to form the new (correct) target PC
Why? Recall that At run (execution) time, for a taken branch I (from instruction) I || 00 (concatenate) I || 00 (sign extend to 32 bits) + PC PC