760 likes | 940 Views
Chapter 2. SW in high level language. SW in assembly language (instruction set). HW implementation. Why study instruction sets?. Interface of hardware and software Efficient mapping: Software in high level language software in assembly language (instruction set) ( Chapter 2 )
E N D
SW in high level language SW in assembly language (instruction set) HW implementation Why study instruction sets? • Interface of hardware and software • Efficient mapping: • Software in high level language software in assembly language (instruction set) (Chapter 2) • Impact SW cost/performance • Instruction set hardware implementation (Chapter 4) • Impact HW cost/performance
C, C++, SystemC, etc. Assembly programming Verilog/VHDL Electronic System Design Laboratory • GOAL: • Training of students who are able master the hardware/software co-design, co-simulation, co-verification.
What is “Computer Architecture”? Application • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, andEvaluation Operating System Compiler Firmware Instruction Set Architecture Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout
Instructions: • Language of the Machine • We’ll be working with the MIPS instruction set architecture • similar to other architectures developed since the 1980's • Almost 100 million MIPS processors manufactured in 2002 • used by NEC, Nintendo, Cisco, Silicon Graphics, Sony, …
MIPS arithmetic • All instructions have 3 operands • Operand order is fixed (destination first)Example: C code: a = b + c MIPS ‘code’: add a, b, c (we’ll talk about registers in a bit)“The natural number of operands for an operation like addition is three…requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple”
MIPS arithmetic • Design Principle: simplicity favors regularity. • Of course this complicates some things... C code: a = b + c + d; MIPS code: add a, b, c add a, a, d • Operands must be registers, only 32 registers provided • Each register contains 32 bits • Design Principle: smaller is faster. Why?
Control Input Memory Datapath Output Processor I/O Registers vs. Memory • Arithmetic instructions operands must be registers, — only 32 registers provided • Compiler associates variables with registers • What about programs with lots of variables Registers
Memory Organization • Viewed as a large, single-dimension array, with an address. • A memory address is an index into the array • "Byte addressing" means that the index points to a byte of memory. 0 8 bits of data 1 8 bits of data 2 8 bits of data 3 8 bits of data 4 8 bits of data 5 8 bits of data 6 8 bits of data ...
Memory Organization • Bytes are nice, but most data items use larger "words" • For MIPS, a word is 32 bits or 4 bytes. • 232 bytes with byte addresses from 0 to 232-1 • 230 words with byte addresses 0, 4, 8, ... 232-4 • Words are aligned i.e., what are the least 2 significant bits of a word address? 0 32 bits of data 4 32 bits of data Registers hold 32 bits of data 8 32 bits of data 12 32 bits of data ...
MIPS arithmetic (with registers) • All instructions have 3 operands • Operand order is fixed (destination first)Example: C code: A = B + C MIPS code: add $s0, $s1, $s2 (associated with variables by compiler)
MIPS arithmetic (with registers) • Design Principle: simplicity favors regularity. Why? • Of course this complicates some things... C code: A = B + C + D; E = F - A; MIPS code: add $t0, $s1, $s2 add $s0, $t0, $s3 sub $s4, $s5, $s0 • Which variables go to which registers? • Operands must be registers, only 32 registers provided • Design Principle: smaller is faster. Why? • Note • Additional register usage: $t0 (allocated by the compiler)
Operand in Memory • Base address and offset C code: g = h + A[8];MIPS ‘code’: lw $t0, 8($s3) ; assume $s3 have the start address ; of A matrix, 8 is offset add $s1, $s2, $t0
Instructions • Load and store instructions • Example: C code: A[8] = h + A[8]; MIPS code: lw $t0, 32($s3) ; $s3=A, 32=8*4 add $t0, $s2, $t0 sw $t0, 32($s3) • Store word has destination last • Remember: • Operands of arithmetic/logic instructions are registers, not memory! • Load/store instructions have one memory operand. • Note: • Temporary register: $t0; • Array name: a register $s3; • Displacement: 32, not 8!
Our First Example • Can we figure out the code? swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; } swap: muli $2, $5, 4 add $2, $4, $2 ; $s2= addr.of v[k] lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 ; Return addr. is saved in $s31
So far we’ve learned: • MIPS — loading words but addressing bytes — arithmetic on registers only • InstructionMeaning (Register Transfer Language, RTL)add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100] = $s1
Machine Language: add/sub (arithmetic) • Instructions, like registers and words of data, are also 32 bits long • Example: add $t0, $s1, $s2 • registers have numbers, $t0=8, $s1=17, $s2=18 • Instruction Format:000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct • Can you guess what the field names stand for?
Machine Language: load/store • Now include the load/store instructions into the same instruction format (regularity principle): • Example: lw $s1, 32($s2) • registers have numbers, $s1=2, $s2=18 • Using the same Instruction Format as arithmetic operations:100011 10010 xxxxx 0001000000100000 op rs rt rd shamtH shamtL • Can you see any problem?
Machine Language: load/store instructions • Consider the load-word and store-word instructions, • What would the regularity principle have us do? • New principle: Good design demands a compromise • Introduce a new type of instruction format • I-type for data transfer instructions • other format was R-type for register • Example: lw $t0, 32($s2) 35 18 2 32 op rs rt 16 bit number • Where's the compromise?
Machine Language PROBLEM: How to access an array element with displacement > 2^16? • Displacement > 2^16? X=A[100000]+……. Assume t1 is a temporary 32-bit register . m[1024] the memory location which has a large value. Its address is calculated by 0($s). t3 is a register contain the base address of array A. t4 is a temporary 32 bits register . lw $t1 , 0($s2); //load immediate to $t1. add $t4 , $t3 , $t1; //calculate the displacement. lw $t5 , 0($t4); //load the displacement to t5. t1 → ← m[1024] t3 → ﹜ Displacement>2^16 t5 → ← A[100000]
Processor Memory Stored Program Concept • Instructions are bits • Programs are stored in memory — to be read or written just like data • Fetch & Execute Cycle • Instructions are fetched and put into a special register: instruction register • Bits in the register "control" the subsequent actions • Fetch the “next” instruction and continue memory for data, programs, compilers, editors, etc.
Control • Decision making instructions • alter the control flow, • i.e., change the "next" instruction to be executed • Sequential execution: implicitly implied! • MIPS conditional branch instructions:bne $t0, $t1, Label beq $t0, $t1, Label • Example: if (i==j) h = i + j;bne $s0, $s1, Label add $s3, $s0, $s1 Label: ....
Control • MIPS unconditional branch instructions: j label • Example:if (i==j) bne $s4, $s5, Lab1h=i+j; add $s3, $s4, $s5 else j Lab2h=i-j; Lab1: sub $s3, $s4, $s5 ... Lab2: ... • Can you build a simple for loop?
Control (II) • MIPS unconditional branch instructions: j label • Example:if (i!=j) beq $s4, $s5, Lab1h=i-j; sub $s3, $s4, $s5 else j Lab2h=i+j; Lab1: add $s3, $s4, $s5 ... Lab2: ... • Can you build a simple for loop?
Y Y BranchTarget (Label) BranchTarget (Label) I == J I != J N N Fall through (PC++) Fall through (PC++) Is one enough? • MIPS conditional branch instructions:bne $t0, $t1, Label ; branch if not equal beq $t0, $t1, Label ; branch if equal • Since bneand beqare complement, can we use and implement only one of them in software and hardware?
op rs rt rd shamt funct op rs rt 16 bit address op 26 bit address So far: • InstructionMeaning (Register Transfer Language, RTL)add $s1,$s2,$s3 $s1 = $s2 + $s3sub $s1,$s2,$s3 $s1 = $s2 – $s3lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1bne $s4,$s5,L Next instr. is at Label if $s4 ° $s5beq $s4,$s5,L Next instr. is at Label if $s4 = $s5j Label Next instr. is at Label • Formats: R I J
Control Flow • We have: beq, bne, what about Branch-if-less-than? • New instruction: if $s1 < $s2 then $t0 = 1 slt $t0, $s1, $s2 else $t0 = 0 • Can use this instruction to build "blt $s1, $s2, Label" — can now build general control structures • Note that the assembler needs two registers to do this, — there are policy of use conventions for registers • Pseudo instruction: "blt $s1, $s2, Label" Mapped to slt $t0, $s1, $s2 beq $t0, $t1, Label % $t1=1
Compiling Loops in C • Use shift left logic (sll) to multiply 4 C code: while (save[i]== k)i += 1;MIPS ‘code’: Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1)bne $t0, $s5, Exitadd $s3, $s3, 1j LoopExit:
Procedure Call: basic concept • Caller • The program that instigates a procedure and provides the necessary parameter values. • Callee • A procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller. • Return Address • A link to the calling site that allows a procedure to return to the proper address; in MIPS it is stored in the register $ra • Stack • A data structure for spilling registers organized as a last-in-first-out queue. • Stack Pointer • A value denoting the most recently allocated address in a stack that shows where registers should be spilled or where old register values can be found.
Proc. A calls Proc. B Proc. A (return from Proc. B) Proc. A High Address $fp $fp $sp $sp $fp Saved argument Saved returnaddress Saved savedregisters Local arrays andstructures $sp Low Address Allocate New Data on the Stack • Frame pointer ($fp) • Help $sp to save the first address of the callee procedure (a stable based register within a procedure for local memory references; $sp might be changed during the procedure.)
Allocate New Data on the Heap • Heap vs. stack • Heap used to save static variable and dynamic data structure • Arrays, linked lists, etc. $sp 7fff fffchex Stack Dynamic data Static data $gp 1000 8000hex 1000 0000hex Text pc 0040 0000hex Reserved 0
Saving registers • Both leaf and non-leaf procedures need to save: • Saved registers (so space can be reused by other variables) • Non-leaf procedures need to save additionally: • Argument registers • Temporary registers • Return register
Policy of Use Conventions Register 1 ($at) reserved for assembler, 26-27 for operating system
C Pure Procedure • Stack pointer ($sp) and return address ($ra) C code: int leaf_example(int g, int h, int i, int j) { int f;f = (g + h) – (i + j);return f }MIPS ‘code’: leaf_example:addi $sp, $sp, -12 ; backup the values sw $t1, 8($sp) ; of registers which sw $t0, 4($sp) ; will be used in this sw $s0, 0($sp) ; procedure add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1add $v0, $s0, $zerolw $s0, 0($sp) ; restore the values lw $t0, 4($sp) ; saved in stack lw $t1, 8($sp) ; previously addi $sp, $sp, 12jr $ra ; return address
Recursive Procedure • Stack pointer ($sp) and return address ($ra) C code: int fact(int n) {if(n < 1)return (1);else return (n *fact(n-1)); }MIPS ‘code’: fact:addi $sp, $sp, -8 sw $ra, 4($sp) ; backup the return address sw $a0, 0($sp) ; & argument n slti $t0, $a0, 1 beq $t0, $zero, L1addi $v0, $zero, 1addi $sp, $sp, 8 jr $raL1: addi $a0, $a0, -1jal factlw $a0, 0($sp) ; restore the return address lw $ra, 4($sp) ; & argument n addi $sp, $sp, 8mul $v0, $a0, $v0jr $ra ; return to the caller
String Copy Procedure • Stack pointer ($sp) and return address ($ra) C code: void strcpy(char x[], char y[]) { int i;i = 0;while((x[i] =y[i]) != ‘\0’)i += 1; }MIPS ‘code’: strcpy:addi $sp, $sp, -4 ; adjust stack for 1 more item sw $s0, 0($sp) ; backup $s0 add $s0, $zero, $zero ; initial i to 0L1: add $t1, $s0, $a1 ; $t1 = addr. of y[i] lb $t2, 0($t1) ; $t2 = y[i]add $t3, $s0, $a0 ; $t3 = addr. of x[i] sb $t2, 0($t3) ; x[i] = y[i]beq $t2, $zero, L2 ; if y[i]==0, goto L2 addi $s0, $s0, 1 ; i = i + 1j L1 ; goto L1 L2: lw $s0, 0($sp) ; end of string; restore $s0 addi $sp, $sp, 4 ; pop 1 word off stackjr $ra ; return to the caller
Constants • Small constants are used quite frequently (50% of operands) e.g., A = A + 5; B = B + 1; C = C - 18; • Solutions? Why not? • put 'typical constants' in memory and load them. • create hard-wired registers (like $zero) for constants like one. • From an instruction field • MIPS Instructions: addi $29, $29, 4 slti $8, $18, 10 andi $29, $29, 6 ori $29, $29, 4 • Design Principle: Make the common case fast. Which format?
filled with zeros 1010101010101010 0000000000000000 1010101010101010 1010101010101010 How about larger constants? • We'd like to be able to load a 32 bit constant into a register • Must use two instructions, new "load upper immediate" instruction lui $t0, 1010101010101010 • Then must get the lower order bits right, i.e.,ori $t0, $t0, 1010101010101010 1010101010101010 0000000000000000 0000000000000000 1010101010101010 ori
Assembly Language vs. Machine Language • Assembly provides convenient symbolic representation • much easier than writing down numbers • e.g., destination first • Machine language is the underlying reality • e.g., destination is no longer first • Assembly can provide 'pseudoinstructions' • e.g., “move $t0, $t1” exists only in Assembly • would be implemented using “add $t0,$t1,$zero” • When considering performance you should count real instructions
Other Issues • Things we are not going to cover in lecture support for procedures linkers, loaders, memory layout stacks, frames, recursion manipulating strings and pointers interrupts and exceptions system calls and conventions • Some of these we'll talk about later • We've focused on architectural issues • basics of MIPS assembly language and machine code • we’ll build a processor to execute these instructions.
Overview of MIPS • simple instructions all 32 bits wide • very structured, no unnecessary baggage • only three instruction formats • rely on compiler to achieve performance — what are the compiler's goals? • Help compiler where we can!!! op rs rt rd shamt funct R I J op rs rt 16 bit address op 26 bit address
Addresses in Branches • Instructions: bne $t4,$t5,LabelNext instruction is at Label if $t4°$t5 beq $t4,$t5,LabelNext instruction is at Label if $t4=$t5 • Formats: • Could specify a register (like lw and sw) and add it to address • use Instruction Address Register (PC = program counter) • most branches are local (principle of locality) • Jump instructions just use high order bits of PC • address boundaries of 256 MB op rs rt 16 bit address I
Addresses in Branches and Jumps • Instructions: bne $t4,$t5,LabelNext instruction is at Label if $t4 ° $t5 beq $t4,$t5,LabelNext instruction is at Label if $t4 = $t5 j LabelNext instruction is at Label • Formats: • Addresses are not 32 bits — How do we handle this with load and store instructions? op rs rt 16 bit address I J op 26 bit address
Program Translation • Translation Hierarchy (Unix file, Windows file system) C Program *.c, *.C Compiler *.s, *.ASM Assembly Library: *.a, *.LIB Assembler Dynamic linked library:*.so, *.DLL *.o, *.OBJ Object: Library routine (machine code) Object: Machine language Linker a.out, *.EXE Executable: Machine program Loader Memory
Linking Object Files • Reallocate the address in text segment and data segment • Link procedure A & B
$gp 1000 8000hex Reallocated Executable Image • Text segment starts at 40 0000 • $gp = 1000 8000hex • Data segment starts at 1000 0000hex = $gp + 8000hex (= $gp – 00008000hex) Stack Dynamic data Static data 1000 0000hex Text pc 0040 0000hex Reserved 0
Loader • Operating system read executable file to memory and start it Read header determine size of text and data segment Create space for text and data segment Copy instructions and data into memory Copy parameter to main program Initialize register and stack pointer Jump to start-up routine Exit system call
Dynamic Linked Library • Disadvantage of static library routine: • In update: the library become old in code when new one is released • In size: library routine become part of the code • Lazy procedure linkage • Overhead on first time called • Pay nothing when return from the library