Chapter 8 CPU and Memory: Design, Implementation, and Enhancement

Chapter 8CPU and Memory:Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander John Wiley and Sons 2003

CPU Architecture Overview • CISC – Complex Instruction Set Computer • RISC – Reduced Instruction Set Computer • CISC vs. RISC Comparisons • VLIW – Very Long Instruction Word • EPIC – Explicitly Parallel Instruction Computer Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

CISC Architecture • Examples • Intel x86, IBM Z-Series Mainframes, older CPU architectures • Characteristics • Few general purpose registers • Many addressing modes • Large number of specialized, complex instructions • Instructions are of varying sizes Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Limitations of CISC Architecture • Complex instructions are infrequently used by programmers and compilers • Memory references, loads and stores, are slow and account for a significant fraction of all instructions • Procedure and function calls are a major bottleneck • Passing arguments • Storing and retrieving values in registers Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

RISC Features • Examples • Power PC, Sun Sparc, Motorola 68000 • Limited and simple instruction set • Fixed length, fixed format instruction words • Enable pipelining, parallel fetches and executions • Limited addressing modes • Reduce complicated hardware • Register-oriented instruction set • Reduce memory accesses • Large bank of registers • Reduce memory accesses • Efficient procedure calls Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

CISC vs. RISC Processing Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Circular Register Buffer Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Circular Register Buffer- After Procedure Call Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

CISC vs. RISC Performance Comparison • RISC  Simpler instructions  more instructions  more memory accesses • RISC  more bus traffic and increased cache memory misses • More registers would improve CISC performance but no space available for them • Modern CISC and RISC architectures are becoming similar Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

VLIW Architecture • Transmeta Crusoe CPU • 128-bit instruction bundle = molecule • 4 32-bit atoms (atom = instruction) • Parallel processing of 4 instructions • 64 general purpose registers • Code morphing layer • Translates instructions written for other CPUs into molecules • Instructions are not written directly for the Crusoe CPU Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

EPIC Architecture • Intel Itanium CPU • 128-bit instruction bundle • 3 41-bit instructions • 5 bits to identify type of instructions in bundle • 128 64-bit general purpose registers • 128 82-bit floating point registers • Intel X86 instruction set included • Programmers and compilers follow guidelines to ensure parallel execution of instructions Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Paging • Managed by the operating system • Built into the hardware • Independent of application Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Logical vs. Physical Addresses • Logical addresses are relative locations of data, instructions and branch target and are separate from physical addresses • Logical addresses mapped to physical addresses • Physical addresses do not need to be consecutive Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Logical vs. Physical Address Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Page Address Layout Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Page Translation Process Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Memory Enhancements • Memory is slow compared to CPU processing speeds! • 2Ghz CPU = 1 cycle in ½ of a billionth of a second • 70ns DRAM = 1 access in 70 millionth of a second • Methods to improvement memory accesses • Wide Path Memory Access • Retrieve multiple bytes instead of 1 byte at a time • Memory Interleaving • Partition memory into subsections, each with its own address register and data register • Cache Memory Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Memory Interleaving Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Why Cache? • Even the fastest hard disk has an access time of about 10 milliseconds • 2Ghz CPU waiting 10 millisecondswastes 20 million clock cycles! Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Cache Memory • Blocks: 8 or 16 bytes • Tags: location in main memory • Cache controller • hardware that checks tags • Cache Line • Unit of transfer between storage and cache memory • Hit Ratio: ratio of hits out of total requests • Synchronizing cache and memory • Write through • Write back Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Step-by-Step Use of Cache Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Performance Advantages • Hit ratios of 90% common • 50%+ improved execution speed • Locality of reference is why caching works • Most memory references confined to small region of memory at any given time • Well-written program in small loop, procedure or function • Data likely in array • Variables stored together Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Two-level Caches • Why do the sizes of the caches have to be different? Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Cache vs. Virtual Memory • Cache speeds up memory access • Virtual memory increases amount of perceived storage • independence from the configuration and capacity of the memory system • low cost per bit Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Modern CPU Processing Methods • Timing Issues • Separate Fetch/Execute Units • Pipelining • Scalar Processing • Superscalar Processing Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Timing Issues • Computer clock used for timing purposes • MHz – million steps per second • GHz – billion steps per second • Instructions can (and often) take more than one step • Data word width can require multiple steps Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Separate Fetch-Execute Units • Fetch Unit • Instruction fetch unit • Instruction decode unit • Determine opcode • Identify type of instruction and operands • Several instructions are fetched in parallel and held in a buffer until decoded and executed • IP – Instruction Pointer register • Execute Unit • Receives instructions from the decode unit • Appropriate execution unit services the instruction Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Alternative CPU Organization Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Instruction Pipelining • Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions • Only one instruction is being executed to completion at a time • Scalar processing • Average instruction execution is approximately equal to the clock speed of the CPU • Problems from stalling • Instructions have different numbers of steps • Problems from branching Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Branch Problem Solutions • Separate pipelines for both possibilities • Probabilistic approach • Requiring the following instruction to not be dependent on the branch • Instruction Reordering (superscalar processing) Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Pipelining Example Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Superscalar Processing • Process more than one instruction per clock cycle • Separate fetch and execute cycles as much as possible • Buffers for fetch and decode phases • Parallel execution units Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Superscalar CPU Block Diagram Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Scalar vs. Superscalar Processing Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Superscalar Issues • Out-of-order processing – dependencies (hazards) • Data dependencies • Branch (flow) dependencies and speculative execution • Parallel speculative execution or branch prediction • Branch History Table • Register access conflicts • Logical registers Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Hardware Implementation • Hardware – operations are implemented by logic gates • Advantages • Speed • RISC designs are simple and typically implemented in hardware Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Microprogrammed Implementation • Microcode are tiny programs stored in ROM that replace CPU instructions • Advantages • More flexible • Easier to implement complex instructions • Can emulate other CPUs • Disadvantage • Requires more clock cycles Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement

Presentation Transcript

Chapter 5 Memory

Memory Management

Chapter 5: Memory Hierarchy Design

Chapter 8 Virtual Memory

Cognitive Enhancement

Chapter 12

Chapter 7 – Design and Implementation

Chapter 7: Memory

Memory

Chapter 8 Virtual Memory

Memory

Chapter 5 Memory

Chapter 5 Memory

Part V Memory System Design

Chapter 7 – Design and Implementation

Chapter 11 Chapter 12

Chapter 7 – Design and Implementation part 1

Chapter 6 Object-Oriented Software Design and Implementation

Chapter 6: Memory

Chapter 11

Chapter Four Research Design and Implementation

Memory Chapter 7