1 / 42

CSE 8383 - Advanced Computer Architecture

CSE 8383 - Advanced Computer Architecture. Week-2 Week of Jan 19, 2004 engr.smu.edu/~rewini/8383. Contents. Placement Policies (Quick Review) Replacement Policies FIFO, Random, Optimal, LRU, MRU Cache Write Policies Pipelines. Memory Hierarchy. Latency Bandwidth. CPU Registers. Cache.

paige
Download Presentation

CSE 8383 - Advanced Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 8383 - Advanced Computer Architecture Week-2 Week of Jan 19, 2004 engr.smu.edu/~rewini/8383

  2. Contents • Placement Policies (Quick Review) • Replacement Policies • FIFO, Random, Optimal, LRU, MRU • Cache Write Policies • Pipelines

  3. Memory Hierarchy Latency Bandwidth CPU Registers Cache Main Memory Secondary Storage Speed Cost per bit

  4. Placement Policies • How to Map memory blocks (lines) to Cache block frames (line frames) Block Frames (Line Frames) Blocks (lines) Cache Memory

  5. Placement Policies • Direct Mapping • Fully Associative • Set Associative

  6. Example – Direct Mapping • Memory  4K blocks • Block size  16 words • Address size log2 (4K * 16) = 16 • Cache  128 blocks Tag Block frame Word 5 7 4

  7. Example - Direct Mapping 0 1 31 5 bits 0 0 128 3968 1 1 129 2 127 127 255 4095 Tag cache Memory

  8. Example – Fully Associative • Memory  4K blocks • Block size  16 words • Address size log2 (4K * 16) = 16 • Cache  128 blocks Tag Word 12 4

  9. Example – Fully Associative 0 1 12 bits 0 1 2 4094 127 4095 Tag cache Memory

  10. Example – Set Assciative • Memory  4K blocks • Block size  16 words • Address size log2 (4K * 16) = 16 • Cache  128 blocks • Num of blocks per set = 4 • Number of sets = 32 Tag Set Word 7 5 4

  11. Example – Set Associative 0 1 127 7 bits 0 32 0 1 33 1 2 Set 0 3 cache Tag 124 31 63 4095 125 126 Set 31 Memory 127

  12. Comparison • Simplicity • Associative Search • Cache Utilization • Replacement

  13. Group Exercise The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache. Show the Format of the address

  14. Group Exercise (Cont.) • Consider the following sequence of addresses. (All are hex numbers) • 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E • In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).

  15. Group Exercise (cont.) • Address size = 40 • Block size  64 words • Num of blocks per set = 4 • Number of sets = 256 • Cache  256*4 blocks Tag Set Word 26 8 6

  16. Group Exercise (cont.) 0E1B01AA05 0E1B011010101000000101 0E1B01AA07 0E1B011010101000000111 • 0E1B2FE305 0E1B2F1110001100000101

  17. Group Exercise (cont.) 0E1B4FFD8F 0E1B4F1111110110001111 0E1B01AA0E 0E1B011010101000001110

  18. Replacement Techniques • FIFO • LRU • MRU • Random • Optimal

  19. Group Exercise Suppose that your cache can hold only three blocks and the block requests are as follows: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal

  20. 0 2 7 2 2 7 4 4 4 2 0 0 7 7 7 0 0 0 0 2 2 1 3 3 0 2 0 1 0 3 2 0 2 1 1 0 1 0 3 1 0 3 0 2 2 0 1 1 3 1 3 2 2 1 2 1 3 Group Exercise (Cont.) FIFO MRU

  21. 7 7 0 0 1 1 Group Exercise (Cont.) OPT LRU

  22. Cache Write Policies • Cache Hit • Write Through • Write Back • Cache Miss • Write-allocate • Write-no-allocate

  23. Read Policy -- Cache Miss • Missed block is brought to cache – required word forwarded immediately to the CPU • Missed block is entirely stored in the cache and the required word is then forwarded to the CPU

  24. Pentium IV two-level cache Processor Cache Level 2 L2 Main Memory Cache Level 1 L1

  25. Cache L1 Cache organization Set-Associative Block Size 64 bytes Cache L1 size 8KB Number of blocks per set Four CPU Addressing Byte addressable

  26. CPU and Memory Interface b CPU 0 n lines 1 MAR 2 b lines MDR Main Memory R / W 2n - 1

  27. Pipelining

  28. Contents • Introduction • Linear Pipelines • Nonlinear pipelines

  29. Basic Idea • Assembly Line • Divide the execution of a task among a number of stages • A task is divided into subtasks to be executed in sequence • Performance improvement compared to sequential execution

  30. n 2 1 Sub-tasks Task Pipeline Stream of Tasks n 2 1 Pipeline

  31. 1 5 2 6 3 7 4 8 5 Tasks on 4 stage pipeline Time Task 1 Task 2 Task 3 Task 4 Task 5

  32. t t t Speedup Stream of m Tasks n 2 1 Pipeline T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

  33. Linear Pipeline • Processing Stages are linearly connected • Perform fixed function • Synchronous Pipeline • Clocked latches between Stage i and Stage i+1 • Equal delays in all stages • Asynchronous Pipeline (Handshaking)

  34. Latches S1 S2 S3 L1 L2 Slowest stage determines delay Equal delays  clock period

  35. Reservation Table Time S1 S2 S3 S4

  36. 5 tasks on 4 stages Time S1 S2 S3 S4

  37. Non Linear Pipelines • Variable functions • Feed-Forward • Feedback

  38. 3 stages & 2 functions Y X S1 S2 S3

  39. Reservation Tables for X & Y S1 S2 S3 S1 S2 S3

  40. Linear Instruction Pipelines • Assume the following instruction execution phases: • Fetch (F) • Decode (D) • Operand Fetch (O) • Execute (E) • Write results (W)

  41. Pipeline Instruction Execution F D O E W

  42. Instruction Dependencies • Data Dependency (Operand is not ready yet) • Instruction Dependency (Branching) Will that Cause a Problem?

More Related