MIPS Assembly Language Synthesis for Machine Code Generation

assemblylanguagesourcecode assembler "machine code"

To execute a program: 1 put the "machine code" into memory 2 jal __start (the OS does this) memory "machine code"

assembler's task • assign addresses • generate "machine code" • (architecture dependent)further translation of assembly language source code

previous architectures 1 assemblylanguageinstruction 1 machinecodeinstruction MIPS architecture 1 assemblylanguageinstruction 1 or moremachine codeinstructions

This further translation is also called synthesis. MIPS example of synthesis: add $8, $9, -16 becomes addi $8, $9, -16

Two operands in the source code add $8, $9 are expanded back out to become add $8, $8, $9

integer multiplication and division each produce 2 32-bit results integer division produces • quotient • remainder integer multiplication of 2 32-bit operands produces a 64-bit result

MIPS hardware implements 2 extra registers (called HI and LO) to hold these results. Here are 4 more MIPS instructions: mflo R mtlo R mfhi R mthi R m move lo register LO f from hi register HI t to

multiplication mul $8, $9, $10 becomes mult $9, $10 mflo $8 X HI LO

division div $8, $9, $10 becomes div $9, $10 mflo $8 # quotient in LO rem $12, $13, $14 becomes div $13, $14 mfhi $12 # remainder in HI

puts, putc, getc, and done are not TAL ! I/O is accomplished by requesting service from the operating system (OS). All architectures do this with a single instruction. On MIPS, this instruction is syscall (no operands)

(note that this is specific to our simulator) To help the OS distinguish what service is required, $v0 ($2) is set:

synthesis of puts $8

lw $8, X becomes la $8, X lw $8, 0($8) Oops! la must be synthesized.

synthesis of la $8, my_label • requires the address assigned for my_label • every address is assigned by the assembler MS part LS part 16 16 32

la $8, my_label becomes lui $8, 0xMS part ori $8, $8, 0xLS part

after lui $8, 0xMS part $8 MS part 000 . . . 0 this is then logically ORed with LS part 000 . . . 0 due to the instruction ori $8, $8, 0xLS part resulting contents of $8: $8 MS part LS part

Synthesize lw $8, X Assume X is assigned address 0xaaee0018. first try: la $8, X lw $8, 0($8) with synthesis of the la instruction: lui $8, 0xaaee ori $8, $8, 0x0018 lw $8, 0($8)

Synthesize sb $12, X Assume X is assigned address 0x080001a0.

Generate machine code for addi $8, $20, 15 From the TAL table: addi Rt, Rs, I Rtis $8 Rsis $20 I is 0000 0000 0000 1111 0010 00ss ssst tttt ii .. ii sssss is 10100 (for $20) ttttt is 01000 (for $8) 0010 0010 1000 1000 0000 0000 0000 1111 in hex 0x2288000f op code 16 bits

Generate machine code for lw $8, 12($sp) lw Rt, I(Rb) Rtis $8 Rbis $sp (which is $29) I is 12 1000 11bb bbbt tttt ii .. ii bbbbb is 11101 ttttt is 01000 1000 1111 1010 1000 0000 0000 0000 1100 in hex 0x8fa8000c op code 16 value 12

assemblylanguagesourcecode assembler assign addresses produce machine code memory image

Problem: forward references .text beq $8, $11, later_in_code later_in_code: lw $20, X .data X: .word 16

Simple solution: 2-pass assembler • first pass: • (MIPS-only) MAL  TAL synthesis • assign all addresses • second pass: • produce all machine code More complex and more efficient: 1-pass assembler • Keep a list of instructions that cannot be completed due to yet-to-be-assigned addresses. As addresses are assigned, check the list and complete instructions.

assign all addresses(and remember them) implies the use of a table holding the mapping of addresses to labels called a symbol table

As the assembler works on the source code, it scans the characters in the file. • Scanner (a SW module) • breaks a set of characters into significant groups known as tokens • often, tokens are separated by white space or special punctuation .data a1: .word 3 loop: lw $7, 4($6)

.data a1: .word 3 a2: .word 16:4 a3: .word 5 .text __start: la $6, a2 loop: lw $7, 4($6) mult $9, $10 b loop done

2 segments: code and data • The assembler places items into these 2 segments. So, it needs addresses. • Use starting addresses of data 0x0040 0000 code 0x0080 0000 • The variable internal to the assembler that represents the next address to be assigned is the location counter.

TAL equivalent of code: .text __start: lui $6, 0x0040 # la $6, a2 ori $6, $6, 0x0004 loop: lw $7, 4($6) mult $9, $10 beq $0, $0, loop # b loop ori $2, $0, 10 # done syscall

As a result of processing the entire .data section, the memory image will be addresscontents 0x0040 0000 0x0000 0003 0x0040 0004 0x0000 0010 0x0040 0008 0x0000 0010 0x0040 000c 0x0000 0010 0x0040 0010 0x0000 0010 0x0040 0014 0x0000 0005

(1) Machine code for la $6, a2 Synthesized:lui $6, 0x0040 (address from symbol table) ori $6, $6, 0x0004 lui Rt, I Rtis $6 0011 1100 000t tttt ii .. ii ttttt is 00110 0011 1100 0000 0110 0000 0000 0100 0000 in hex 0x3c060040 (2) (1) op code 16

Add to the memory image addresscontents 0x0080 0000 0x3c06 0040

Machine code for ori $6, $6, 0x0004 ori Rt, Rs, I Rtis $6 Rsis $6 0011 01ss ssst tttt ii .. ii ttttt is 00110 sssss is 00110 0011 0100 1100 0110 0000 0000 0000 0100 in hex 0x34c60004 (2) op code 16

Add it to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 0x0080 0004 0x34c6 0004

Scanning on, machine code for lw $7, 4($6) lw Rt, I(Rb) Rtis $7 Rbis $6 I is 4 1000 11bb bbbt tttt ii .. ii 1000 1100 1100 0111 0000 0000 0000 0100 in hex 0x8cc70004 op code 16

Add it to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 (lui) 0x0080 0004 0x34c6 0004 (ori) 0x0080 0008 0x8cc7 0004 (lw)

Rs Rt next comes mult $9, $10 0000 00ss ssst tttt 0000 0000 0001 1000 0000 0001 0010 1010 0000 0000 0001 1000 in hex 0x012a0018 01001 01010 Rd op code

op code 000000 is used for any arithmetic or logical instruction with 3 register operands 0000 00ss ssst tttt dddd d??? ???? ???? whichoperation

Add mult to the memory image as well, updating the location counter addresscontents 0x0080 0000 0x3c06 0040 (lui) 0x0080 0004 0x34c6 0004 (ori) 0x0080 0008 0x8cc7 0004 (lw) 0x0080 000c 0x012a 0018 (mult)

b loop is a pseudoinstruction (must be synthesized) Many translations: beq $0, $0, loop bgez $0, loop blez $0, loop j loop

beq $0, $0, loop 0001 00ss ssst tttt iii ... ii Rs Rt 00000 00000 op code I I is a derivation of an offset.

At run (execution) time, for a taken branch I (from instruction) I || 00 (concatenate) I || 00 (sign extend to 32 bits) + PC  PC

bytedifference I computed by the assembler relies on except, when the PC (the branch address!) is used (at execution time), the PC update step (of the fetch and execute cycle) has already been completed. So, targetaddress branchaddress - = bytedifference targetaddress branchaddress 4 + - =

from the symbol table target is loop 0x0080 0008 beq is at 0x0080 0010 byte offset = 0x00800008 – ( 0x00800010 + 4 ) 0000 0000 1000 0000 0000 0000 0000 1000 - 0000 0000 1000 0000 0000 0000 0001 0100 (can't do this in unsigned, so convert to 2's complement) 1111 1111 0111 1111 1111 1111 1110 1100 additive inverse of

0000 0000 1000 0000 0000 0000 0000 1000 + 1111 1111 0111 1111 1111 1111 1110 1100 1111 1111 1111 1111 1111 1111 1111 0100 this represents -12 -12 is the byte offset to be added to the PC to form the new (correct) target PC

Why? Recall that At run (execution) time, for a taken branch I (from instruction) I || 00 (concatenate) I || 00 (sign extend to 32 bits) + PC  PC

MIPS Assembly Language Synthesis for Machine Code Generation

MIPS Assembly Language Synthesis for Machine Code Generation

Presentation Transcript

Assembly Language

Assembly Language

Assembly Language

Assembly Language

Source code

Typed Assembly Language and Proof-Carrying Code

Assembly Language

Assembly Language

Assembly Language

Assembly Language

Assembly Language

assembly language source code

Assembly Language

Assembly Language

Assembly Language

Assembly language

Assembly Language

Assembly Language

Assembly Language

Assembly Language

Assembly Language

Assembly Language