Chapter 2

Chapter 2

SW in high level language SW in assembly language (instruction set) HW implementation Why study instruction sets? • Interface of hardware and software • Efficient mapping: • Software in high level language  software in assembly language (instruction set) (Chapter 2) • Impact SW cost/performance • Instruction set  hardware implementation (Chapter 4) • Impact HW cost/performance

C, C++, SystemC, etc. Assembly programming Verilog/VHDL Electronic System Design Laboratory • GOAL: • Training of students who are able master the hardware/software co-design, co-simulation, co-verification.

What is “Computer Architecture”? Application • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, andEvaluation Operating System Compiler Firmware Instruction Set Architecture Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout

Instructions: • Language of the Machine • We’ll be working with the MIPS instruction set architecture • similar to other architectures developed since the 1980's • Almost 100 million MIPS processors manufactured in 2002 • used by NEC, Nintendo, Cisco, Silicon Graphics, Sony, …

MIPS arithmetic • All instructions have 3 operands • Operand order is fixed (destination first)Example: C code: a = b + c MIPS ‘code’: add a, b, c (we’ll talk about registers in a bit)“The natural number of operands for an operation like addition is three…requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple”

MIPS arithmetic • Design Principle: simplicity favors regularity. • Of course this complicates some things... C code: a = b + c + d; MIPS code: add a, b, c add a, a, d • Operands must be registers, only 32 registers provided • Each register contains 32 bits • Design Principle: smaller is faster. Why?

Control Input Memory Datapath Output Processor I/O Registers vs. Memory • Arithmetic instructions operands must be registers, — only 32 registers provided • Compiler associates variables with registers • What about programs with lots of variables

Memory Organization • Viewed as a large, single-dimension array, with an address. • A memory address is an index into the array • "Byte addressing" means that the index points to a byte of memory. 0 8 bits of data 1 8 bits of data 2 8 bits of data 3 8 bits of data 4 8 bits of data 5 8 bits of data 6 8 bits of data ...

Memory Organization • Bytes are nice, but most data items use larger "words" • For MIPS, a word is 32 bits or 4 bytes. • 232 bytes with byte addresses from 0 to 232-1 • 230 words with byte addresses 0, 4, 8, ... 232-4 • Words are aligned i.e., what are the least 2 significant bits of a word address? 0 32 bits of data 4 32 bits of data Registers hold 32 bits of data 8 32 bits of data 12 32 bits of data ...

MIPS arithmetic (with registers) • All instructions have 3 operands • Operand order is fixed (destination first)Example: C code: A = B + C MIPS code: add $s0, $s1, $s2 (associated with variables by compiler)

MIPS arithmetic (with registers) • Design Principle: simplicity favors regularity. Why? • Of course this complicates some things... C code: A = B + C + D; E = F - A; MIPS code: add $t0, $s1, $s2 add $s0, $t0, $s3 sub $s4, $s5, $s0 • Which variables go to which registers? • Operands must be registers, only 32 registers provided • Design Principle: smaller is faster. Why? • Note • Additional register usage: $t0 (allocated by the compiler)

Operand in Memory • Base address and offset C code: g = h + A[8];MIPS ‘code’: lw $t0, 8($s3) ; assume $s3 have the start address ; of A matrix, 8 is offset add $s1, $s2, $t0

Instructions • Load and store instructions • Example: C code: A[8] = h + A[8]; MIPS code: lw $t0, 32($s3) ; $s3=A, 32=8*4 add $t0, $s2, $t0 sw $t0, 32($s3) • Store word has destination last • Remember arithmetic operands are registers, not memory! • Note: • Temporary register: $t0; • Array name: a register $s3; • Displacement: 32, not 4!

Our First Example • Can we figure out the code? swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; } swap: muli $2, $5, 4 add $2, $4, $2 ; $s2= addr.of v[k] lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 ; Return addr. is saved in $s31

So far we’ve learned: • MIPS — loading words but addressing bytes — arithmetic on registers only • InstructionMeaning (Register Transfer Language, RTL)add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100] = $s1

Machine Language • Instructions, like registers and words of data, are also 32 bits long • Example: add $t0, $s1, $s2 • registers have numbers, $t0=8, $s1=17, $s2=18 • Instruction Format:000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct • Can you guess what the field names stand for?

Machine Language • Consider the load-word and store-word instructions, • What would the regularity principle have us do? • New principle: Good design demands a compromise • Introduce a new type of instruction format • I-type for data transfer instructions • other format was R-type for register • Example: lw $t0, 32($s2) 35 18 9 32 op rs rt 16 bit number • Where's the compromise?

Processor Memory Stored Program Concept • Instructions are bits • Programs are stored in memory — to be read or written just like data • Fetch & Execute Cycle • Instructions are fetched and put into a special register: instruction register • Bits in the register "control" the subsequent actions • Fetch the “next” instruction and continue memory for data, programs, compilers, editors, etc.

Control • Decision making instructions • alter the control flow, • i.e., change the "next" instruction to be executed • Sequential execution: implicitly implied! • MIPS conditional branch instructions:bne $t0, $t1, Label beq $t0, $t1, Label • Example: if (i==j) h = i + j;bne $s0, $s1, Label add $s3, $s0, $s1 Label: ....

Control • MIPS unconditional branch instructions: j label • Example:if (i==j) bne $s4, $s5, Lab1h=i+j; add $s3, $s4, $s5 else j Lab2h=i-j; Lab1: sub $s3, $s4, $s5 ... Lab2: ... • Can you build a simple for loop?

Control (II) • MIPS unconditional branch instructions: j label • Example:if (i!=j) beq $s4, $s5, Lab1h=i-j; sub $s3, $s4, $s5 else j Lab2h=i+j; Lab1: add $s3, $s4, $s5 ... Lab2: ... • Can you build a simple for loop?

Y Y BranchTarget (Label) BranchTarget (Label) I == J I != J N N Fall through (PC++) Fall through (PC++) Is one enough? • MIPS conditional branch instructions:bne $t0, $t1, Label ; branch if not equal beq $t0, $t1, Label ; branch if equal • Since bneand beqare complement, can we use and implement only one of them in software and hardware?

op rs rt rd shamt funct op rs rt 16 bit address op 26 bit address So far: • InstructionMeaning (Register Transfer Language, RTL)add $s1,$s2,$s3 $s1 = $s2 + $s3sub $s1,$s2,$s3 $s1 = $s2 – $s3lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1bne $s4,$s5,L Next instr. is at Label if $s4 ° $s5beq $s4,$s5,L Next instr. is at Label if $s4 = $s5j Label Next instr. is at Label • Formats: R I J

Control Flow • We have: beq, bne, what about Branch-if-less-than? • New instruction: if $s1 < $s2 then $t0 = 1 slt $t0, $s1, $s2 else $t0 = 0 • Can use this instruction to build "blt $s1, $s2, Label" — can now build general control structures • Note that the assembler needs two registers to do this, — there are policy of use conventions for registers • Pseudo instruction: "blt $s1, $s2, Label" Mapped to slt $t0, $s1, $s2 beq $t0, $t1, Label % $t1=1

Compiling Loops in C • Use shift left logic (sll) to multiply 4 C code: while (save[i] == k)i += 1;MIPS ‘code’: Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1)bne $t0, $s5, Exitadd $s3, $s3, 1 j Loop Exit:

C Pure Procedure • Stack pointer ($sp) and return address ($ra) C code: int leaf_example(int g, int h, int i, int j) { int f;f = (g + h) – (i + j);return f }MIPS ‘code’: leaf_example:addi $sp, $sp, -12 ; backup the values sw $t1, 8($sp) ; of registers which sw $t0, 4($sp) ; will be used in this sw $s0, 0($sp) ; procedure add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1add $v0, $s0, $zerolw $s0, 0($sp) ; restore the values lw $t0, 4($sp) ; saved in stack lw $t1, 8($sp) ; previously addi $sp, $sp, 12jr $ra ; return address

Recursive Procedure • Stack pointer ($sp) and return address ($ra) C code: int fact(int n) {if(n < 1)return (1);else return (n *fact(n-1)); }MIPS ‘code’: fact:addi $sp, $sp, -8 sw $ra, 4($sp) ; backup the return address sw $a0, 0($sp) ; & argument n slti $t0, $a0, 1 beq $t0, $zero, L1addi $v0, $zero, 1addi $sp, $sp, 8 jr $raL1: addi $a0, $a0, -1jal factlw $a0, 0($sp) ; restore the return address lw $ra, 4($sp) ; & argument n addi $sp, $sp, 8mul $v0, $a0, $v0jr $ra ; return to the caller

Policy of Use Conventions Register 1 ($at) reserved for assembler, 26-27 for operating system

Allocate New Data on the Stack • Frame pointer ($fp) • Help $sp to save the first address of the callee procedure High Address $fp $fp $sp $sp $fp Saved argument Saved returnaddress Saved savedregisters Local arrays andstructures $sp Low Address

Allocate New Data on the Heap • Heap vs. stack • Heap used to save static variable and dynamic data structure $sp 7fff fffchex Stack Dynamic data Static data $gp 1000 8000hex 1000 0000hex Text pc 0040 0000hex Reserved 0

String Copy Procedure • Stack pointer ($sp) and return address ($ra) C code: void strcpy(char x[], char y[]) { int i;i = 0;while((x[i] =y[i]) != ‘\0’)i += 1; }MIPS ‘code’: strcpy:addi $sp, $sp, -4 sw $s0, 0($sp) ; backup $s0 add $s0, $zero, $zero ; initial i to 0L1: add $t1, $s0, a1lb $t2, 0($t1)add $t3, $s0, $a0sb $t2, 0($t3)beq $t2, $zero, L2 addi $s0, $s0, 1j L1L2: lw $s0, 0($sp) ; end of string addi $sp, $sp, 4jr $ra ; return to the caller

Constants • Small constants are used quite frequently (50% of operands) e.g., A = A + 5; B = B + 1; C = C - 18; • Solutions? Why not? • put 'typical constants' in memory and load them. • create hard-wired registers (like $zero) for constants like one. • From an instruction field • MIPS Instructions: addi $29, $29, 4 slti $8, $18, 10 andi $29, $29, 6 ori $29, $29, 4 • Design Principle: Make the common case fast. Which format?

filled with zeros 1010101010101010 0000000000000000 1010101010101010 1010101010101010 How about larger constants? • We'd like to be able to load a 32 bit constant into a register • Must use two instructions, new "load upper immediate" instruction lui $t0, 1010101010101010 • Then must get the lower order bits right, i.e.,ori $t0, $t0, 1010101010101010 1010101010101010 0000000000000000 0000000000000000 1010101010101010 ori

Assembly Language vs. Machine Language • Assembly provides convenient symbolic representation • much easier than writing down numbers • e.g., destination first • Machine language is the underlying reality • e.g., destination is no longer first • Assembly can provide 'pseudoinstructions' • e.g., “move $t0, $t1” exists only in Assembly • would be implemented using “add $t0,$t1,$zero” • When considering performance you should count real instructions

Other Issues • Things we are not going to cover in lecture support for procedures linkers, loaders, memory layout stacks, frames, recursion manipulating strings and pointers interrupts and exceptions system calls and conventions • Some of these we'll talk about later • We've focused on architectural issues • basics of MIPS assembly language and machine code • we’ll build a processor to execute these instructions.

Overview of MIPS • simple instructions all 32 bits wide • very structured, no unnecessary baggage • only three instruction formats • rely on compiler to achieve performance — what are the compiler's goals? • help compiler where we can op rs rt rd shamt funct R I J op rs rt 16 bit address op 26 bit address

Addresses in Branches • Instructions: bne $t4,$t5,LabelNext instruction is at Label if $t4°$t5 beq $t4,$t5,LabelNext instruction is at Label if $t4=$t5 • Formats: • Could specify a register (like lw and sw) and add it to address • use Instruction Address Register (PC = program counter) • most branches are local (principle of locality) • Jump instructions just use high order bits of PC • address boundaries of 256 MB op rs rt 16 bit address I

Addresses in Branches and Jumps • Instructions: bne $t4,$t5,LabelNext instruction is at Label if $t4 ° $t5 beq $t4,$t5,LabelNext instruction is at Label if $t4 = $t5 j LabelNext instruction is at Label • Formats: • Addresses are not 32 bits — How do we handle this with load and store instructions? op rs rt 16 bit address I J op 26 bit address

To summarize:

Program Translation • Translation Hierarchy (Unix file, Windows file system) C Program *.c, *.C Compiler *.s, *.ASM Assembly Library: *.a, *.LIB Assembler Dynamic linked library:*.so, *.DLL *.o, *.OBJ Object: Library routine (machine code) Object: Machine language Linker a.out, *.EXE Executable: Machine program Loader Memory

Linking Object Files • Reallocate the address in text segment and data segment • Link procedure A & B

Reallocated Executable Image • Text segment starts at 40 0000 • Data segment starts at 1000 0000 = 8000 + $gp Stack Dynamic data Static data $gp 1000 0000hex Text pc 0040 0000hex Reserved 0

Loader • Operating system read executable file to memory and start it Read header determine size of text and data segment Create space for text and data segment Copy instructions and data into memory Copy parameter to main program Initialize register and stack pointer Jump to start-up routine Exit system call

Dynamic Linked Library • Disadvantage of static library routine: • In update: the library become old in code when new one is released • In size: library routine become part of the code • Lazy procedure linkage • Overhead on first time called • Pay nothing when return from the library

Lazy Procedure Linkage Text jal … lw jr … jal … lw jr … Text Data Data Indirect jump Indirect jump li ID j … Text Text Dynamic Linker/Loader j … Subsequent call First call DLL Routine … jr Text DLL Routine … jr Text

Java Program • Java feature • Run on any computer • Slow execution time • Compile to Java bytecode instructions that easy to interpret Java Program Compiler Class files (Java bytecode) Java Library routine (machine language) Just In TimeCompiler Java Virtual Machine Software interpreter Compiled Java methods (machine language)

Passes or Phases in Optimizing Compiler • High-level optimization • Procedure inlining • Reduce loop overhead • Loop unrolling • Improve memory behavior • Interchange nested loop • Blocking loop Dependencies Front end perlanguage Machine Language Independent Dependent Intermediaterepresentation High-leveloptimization Globaloptimizer Code generator Dependent Independent

Local Optimizations • Common subexpression elimination# x[i] + 4 li R100, x li R100, x lw R101,i lw R101,i mult R102, R101, 4 mult R102, R101, 4 add R103, R100, R102 add R103, R100, R102 lw R104, 0(R103) lw R104, 0(R103) # x[i] in R104 add R105, R104, 4 add R105, R104, 4 # x[i] = li R106, x sw R105, 0(R103) lw R107, i mult R108, R107, 4 add R109, R016, R107 sw R105, 0(R109) • Strength reduction: replace mult by shift left • Constant propagation: collapse constant whenever possible • Copy propagation: eliminate the need to reload value • Dead store elimination: eliminate “store” value not used again • Dead code elimination: eliminate the code which not affect final result

Chapter 2

Chapter 2

Presentation Transcript

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2:

Chapter 2

chapter 2

chapter 2

Chapter 2-2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2