1 / 55

Computer Architecture Foundations for Graduate Level Students

Learn concepts of CPU, cache memory, data transfers, cache hit ratio, replacement algorithms, fetch mechanisms, and access time in computer architecture for graduate-level students. Understand replacement policies like FIFO and optimal algorithm. Dive into Belady's Anomaly and practical examples for page replacement strategies.

bratt
Download Presentation

Computer Architecture Foundations for Graduate Level Students

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Foundations for Graduate Level Students

  2. Basic Paradigm HD CPU MM cache

  3. Transfers of data HD CPU MM cache

  4. Transfers of data HD CPU MM cache

  5. CPU needs particular data HD CPU MM cache

  6. If required data is found in cache HD CPU MM cache

  7. When required data is not in cache HD CPU MM cache

  8. CPU ultimately gets data from cache HD CPU MM cache

  9. If data is not in cache and in MM HD CPU MM cache

  10. From HD to MM to cache HD CPU MM cache

  11. If MM is full HD CPU MM cache

  12. If cache is full HD CPU MM cache

  13. Swapping between MM and cache HD CPU MM cache

  14. Access Time • If every memory reference to cache required transfer of one word between MM and cache, no increase in speed is achieved. In fact, speed will drop because apart from MM access, there is additional access to cache • Suppose reference is repeated n times, and after the first reference, location is always found in the cache

  15. Cache Hit Ratio • The probability that a word will be found in the cache • Depends upon the program and the size and organization of the cache h = Number of times required word found in cache Total number of references h: hit ratio

  16. Access Time ta = Average access time tc = Cache access time (1-h) = miss ratio tm = Memory access time

  17. Fetch Mechanisms • Demand Fetch • Fetch a block from memory when it is needed and is not in the cache • Prefetch • Fetch block/s from memory before they are requested • Selective Fetch • Not always fetching blocks, dependent on some defined criterion; blocks are stored in MM rather than the cache

  18. Data in cache should be replaced with data from MM • Blocks (a group of memory addresses) are transferred from MM to cache • Cache has a limited capacity (page frame) MM cache

  19. Replacement Algorithms • When the word being requested by the CPU is not in the cache, it needs to be transferred from MM. (or it can also be from secondary memory to MM) • A page fault occurs when a page or a block is not in the cache (or MM in the case of secondary memory) • Replacement algorithms determine which page/block to remove or overwrite

  20. Characteristics • Usage based or Non-usage based • Usage based : the choice of page/block to replace is dependent on the how many times each page/block has been referenced • Non-usage based : Use some other criteria for replacement

  21. Assumptions • For a given page size, we only need to consider the page/block number. • If we have a reference (hit) to a page p, then any immediately succeeding references to p does not cause a page fault • The size of memory/cache is represented as the number of pages it is capable of holding (page frame )

  22. Example Consider the following address sequence calls: 0110 0432 0101 0612 0102 0103 0104 0101 0611 0102 0103 0302 which, at 100 bytes per page, can be reduced to the following access string: 1 4 1 6 1 6 1 3 This sequence of page requests is called a reference string.

  23. Replacement Policies • Random replacement algorithm • First-in first-out replacement • Optimal Algorithm • Least recently used algorithm • Least Frequently Used • Most Frequently Used

  24. Random Replacement • A page is chosen randomly at page fault time • There is no relationship between the pages or their use. • Choice is done by a random number generator.

  25. FIFO • Memory treated as a queue • When a page comes in, it is inserted at the tail • When a page is removed, the entry at the head of the queue gets deleted • Easy to understand and program • Performance is not consistently good; dependent on reference string

  26. FIFO Example Consider the following reference string: 7 0 1 2 0 3 0 4 2 With a page frame of 3 * * * * * * * * 7 0 1 2 2 3 0 4 2 7 0 1 1 2 3 0 4 7 0 0 1 2 3 0 An * indicates a miss (the page requested by the CPU is not in the cache or in MM)

  27. FIFO Example #2 Consider the following reference string: 1 2 3 4 1 2 5 1 2 3 4 5 With a page frame of 3 * * * * * * * * * 1 2 3 4 1 2 5 5 5 3 4 4 1 2 3 4 1 2 2 2 5 3 3 1 2 3 4 1 1 1 2 5 5 We have 9 page faults Try performing this FIFO with a page frame of 4

  28. Belady’s Anomaly • An increase in page frame does not necessarily mean a decrease in page faults • More formally, Belady’s anomalyreflects the fact that, for some page-replacement algorithms, the page fault rate may increase as the number of allocated frames increases

  29. Optimal Algorithm • The page that will not be used for the longest period of time is replaced • Guarantees the lowest page fault rate for a fixed number of frames • Difficult to implement because it requires future knowledge of the reference string

  30. Optimal Algorithm Example Consider the following reference string: 7 0 1 2 0 3 0 4 2 With a page frame of 3 We look ahead and see that 7 is the page which will not be used again, so we replace 7; we also note that after our first hit we should not replace 0 immediately, but rather 1 because 1 will not be referenced any more (2 will be referenced last.) * * * * * * 7 0 1 2 2 3 3 4 4 7 0 1 1 2 2 3 3 7 0 0 0 0 2 2

  31. Least Recently Used • Approximates the optimal algorithm • Replaces the page that has not been used for the longest period of time • When all page frames have been used up and every time there is a page hit, the referenced page is placed at the tail to indicate it has been recently accessed

  32. LRU Example Consider the following reference string: 7 0 1 2 0 3 0 4 0 3 0 2 With a page frame of 3 * * * * * * * 7 0 1 2 0 3 0 4 0 3 0 2 7 0 1 2 0 3 0 4 0 3 0 7 0 1 2 2 3 3 4 4 3 We have 7 page faults Try performing this LRU with a page frame of 4

  33. Least Frequently Used • Counts the number of references made to each page; when page is accessed, counter is incremented by one • Page with smallest count is replaced • FIFO is used to resolve a tie • Rationale: Page with the bigger counter is an actively used page • Problem • Page initially actively may never be used again • Solved by using a decaying counter

  34. LFU Example Consider the following reference string: 7 0 1 2 0 3 0 4 0 3 0 2 With a page frame of 3 * * * * * * * 71 01 11 21 21 31 31 41 41 41 41 21 71 01 11 11 21 21 31 31 32 32 32 71 01 02 02 03 03 04 04 05 05 We have 7 page faults Try performing this LFU with a page frame of 4

  35. Most Frequently Used • Opposite of LFU • Replace page with the highest count • Tie is resolved using FIFO • Based on the argument that the page with smallest count has just been probably brought in and is yet to be used • Both LFU and MFU are not common and implementation is expensive.

  36. The Central Processing Unit • The operating hub and heart of every computer system • Composed of • Control Unit • Datapath • Each component inside the CPU has a specific role in executing a command • Communicates with other components of the system

  37. Control Unit (CU) • Regulates all activities inside the machine • Serves as “nerve center” that sends control signals to other units and senses their status • Connected to all components in the CPU as well as main memory

  38. How The CU Is Connected Registers ALU Control Unit Main Memory CPU

  39. Registers ALU Inside the CPU: The Datapath

  40. Registers • Components used for data storage (can be read from or written to) • High speed memory locations used to store important information during CPU operations • Two types • Special • General-purpose

  41. Special Registers • Registers used for specific purposes • Used heavily during execution of CPU instructions

  42. General Purpose Registers • CPU registers used as “scratch pad” during execution of machine-level instructions • Number varies between processors

  43. Arithmetic Logic Unit (ALU) • Performs all mathematical and logical operations within the CPU • Operands not in the CPU would have to be retrieved from main memory

  44. CPU-Memory Coordination • Bus - a group of wires that connect separate components • Types of bus: • Control bus (control signals) • Address bus (address information) • Data bus (instruction/data)

  45. CPU-Memory Coordination • The different busses facilitate communication between the CPU and main memory • Actions of the two components are highly-synchronized to ensure efficient and timely execution of instructions

  46. CPU Operations • Instructions do not reside in the CPU, they have to be fetched from memory • Each machine level instruction is broken down into a logical sequence of smaller steps

  47. CPU Operations • Instructions are carried out by performing one or more of the following functions in some pre-specified sequence • Retrieving data from main memory • Putting data to main memory • Register data transfer • ALU operation

  48. How An Instruction is Processed • Instruction is retrieved from memory • Analyze what the instruction is and how to execute it • Operands/parameters (if any) are fetched from main memory • Instruction is executed • Results are stored (CPU or MM) • Prepare for next instruction

  49. Instruction Processing Example • Fetch instruction from memory • Decode it (turns out to be an ADD) • Get the two numbers to add from MM • Perform the addition • Where will it be stored? • Prepare for next instruction

  50. Processing Data in Clusters • Information is organized into groups of fixed-size data that can be stored and retrieved in a single, basic operation • Each group of n bits is referred to as a word of information • Access to each word requires a distinct name (location/address) • Can also refer to a characteristic of other components (i.e. size of bus)

More Related