550 likes | 562 Views
Learn concepts of CPU, cache memory, data transfers, cache hit ratio, replacement algorithms, fetch mechanisms, and access time in computer architecture for graduate-level students. Understand replacement policies like FIFO and optimal algorithm. Dive into Belady's Anomaly and practical examples for page replacement strategies.
E N D
Computer Architecture Foundations for Graduate Level Students
Basic Paradigm HD CPU MM cache
Transfers of data HD CPU MM cache
Transfers of data HD CPU MM cache
CPU needs particular data HD CPU MM cache
If required data is found in cache HD CPU MM cache
When required data is not in cache HD CPU MM cache
CPU ultimately gets data from cache HD CPU MM cache
If data is not in cache and in MM HD CPU MM cache
From HD to MM to cache HD CPU MM cache
If MM is full HD CPU MM cache
If cache is full HD CPU MM cache
Swapping between MM and cache HD CPU MM cache
Access Time • If every memory reference to cache required transfer of one word between MM and cache, no increase in speed is achieved. In fact, speed will drop because apart from MM access, there is additional access to cache • Suppose reference is repeated n times, and after the first reference, location is always found in the cache
Cache Hit Ratio • The probability that a word will be found in the cache • Depends upon the program and the size and organization of the cache h = Number of times required word found in cache Total number of references h: hit ratio
Access Time ta = Average access time tc = Cache access time (1-h) = miss ratio tm = Memory access time
Fetch Mechanisms • Demand Fetch • Fetch a block from memory when it is needed and is not in the cache • Prefetch • Fetch block/s from memory before they are requested • Selective Fetch • Not always fetching blocks, dependent on some defined criterion; blocks are stored in MM rather than the cache
Data in cache should be replaced with data from MM • Blocks (a group of memory addresses) are transferred from MM to cache • Cache has a limited capacity (page frame) MM cache
Replacement Algorithms • When the word being requested by the CPU is not in the cache, it needs to be transferred from MM. (or it can also be from secondary memory to MM) • A page fault occurs when a page or a block is not in the cache (or MM in the case of secondary memory) • Replacement algorithms determine which page/block to remove or overwrite
Characteristics • Usage based or Non-usage based • Usage based : the choice of page/block to replace is dependent on the how many times each page/block has been referenced • Non-usage based : Use some other criteria for replacement
Assumptions • For a given page size, we only need to consider the page/block number. • If we have a reference (hit) to a page p, then any immediately succeeding references to p does not cause a page fault • The size of memory/cache is represented as the number of pages it is capable of holding (page frame )
Example Consider the following address sequence calls: 0110 0432 0101 0612 0102 0103 0104 0101 0611 0102 0103 0302 which, at 100 bytes per page, can be reduced to the following access string: 1 4 1 6 1 6 1 3 This sequence of page requests is called a reference string.
Replacement Policies • Random replacement algorithm • First-in first-out replacement • Optimal Algorithm • Least recently used algorithm • Least Frequently Used • Most Frequently Used
Random Replacement • A page is chosen randomly at page fault time • There is no relationship between the pages or their use. • Choice is done by a random number generator.
FIFO • Memory treated as a queue • When a page comes in, it is inserted at the tail • When a page is removed, the entry at the head of the queue gets deleted • Easy to understand and program • Performance is not consistently good; dependent on reference string
FIFO Example Consider the following reference string: 7 0 1 2 0 3 0 4 2 With a page frame of 3 * * * * * * * * 7 0 1 2 2 3 0 4 2 7 0 1 1 2 3 0 4 7 0 0 1 2 3 0 An * indicates a miss (the page requested by the CPU is not in the cache or in MM)
FIFO Example #2 Consider the following reference string: 1 2 3 4 1 2 5 1 2 3 4 5 With a page frame of 3 * * * * * * * * * 1 2 3 4 1 2 5 5 5 3 4 4 1 2 3 4 1 2 2 2 5 3 3 1 2 3 4 1 1 1 2 5 5 We have 9 page faults Try performing this FIFO with a page frame of 4
Belady’s Anomaly • An increase in page frame does not necessarily mean a decrease in page faults • More formally, Belady’s anomalyreflects the fact that, for some page-replacement algorithms, the page fault rate may increase as the number of allocated frames increases
Optimal Algorithm • The page that will not be used for the longest period of time is replaced • Guarantees the lowest page fault rate for a fixed number of frames • Difficult to implement because it requires future knowledge of the reference string
Optimal Algorithm Example Consider the following reference string: 7 0 1 2 0 3 0 4 2 With a page frame of 3 We look ahead and see that 7 is the page which will not be used again, so we replace 7; we also note that after our first hit we should not replace 0 immediately, but rather 1 because 1 will not be referenced any more (2 will be referenced last.) * * * * * * 7 0 1 2 2 3 3 4 4 7 0 1 1 2 2 3 3 7 0 0 0 0 2 2
Least Recently Used • Approximates the optimal algorithm • Replaces the page that has not been used for the longest period of time • When all page frames have been used up and every time there is a page hit, the referenced page is placed at the tail to indicate it has been recently accessed
LRU Example Consider the following reference string: 7 0 1 2 0 3 0 4 0 3 0 2 With a page frame of 3 * * * * * * * 7 0 1 2 0 3 0 4 0 3 0 2 7 0 1 2 0 3 0 4 0 3 0 7 0 1 2 2 3 3 4 4 3 We have 7 page faults Try performing this LRU with a page frame of 4
Least Frequently Used • Counts the number of references made to each page; when page is accessed, counter is incremented by one • Page with smallest count is replaced • FIFO is used to resolve a tie • Rationale: Page with the bigger counter is an actively used page • Problem • Page initially actively may never be used again • Solved by using a decaying counter
LFU Example Consider the following reference string: 7 0 1 2 0 3 0 4 0 3 0 2 With a page frame of 3 * * * * * * * 71 01 11 21 21 31 31 41 41 41 41 21 71 01 11 11 21 21 31 31 32 32 32 71 01 02 02 03 03 04 04 05 05 We have 7 page faults Try performing this LFU with a page frame of 4
Most Frequently Used • Opposite of LFU • Replace page with the highest count • Tie is resolved using FIFO • Based on the argument that the page with smallest count has just been probably brought in and is yet to be used • Both LFU and MFU are not common and implementation is expensive.
The Central Processing Unit • The operating hub and heart of every computer system • Composed of • Control Unit • Datapath • Each component inside the CPU has a specific role in executing a command • Communicates with other components of the system
Control Unit (CU) • Regulates all activities inside the machine • Serves as “nerve center” that sends control signals to other units and senses their status • Connected to all components in the CPU as well as main memory
How The CU Is Connected Registers ALU Control Unit Main Memory CPU
Registers ALU Inside the CPU: The Datapath
Registers • Components used for data storage (can be read from or written to) • High speed memory locations used to store important information during CPU operations • Two types • Special • General-purpose
Special Registers • Registers used for specific purposes • Used heavily during execution of CPU instructions
General Purpose Registers • CPU registers used as “scratch pad” during execution of machine-level instructions • Number varies between processors
Arithmetic Logic Unit (ALU) • Performs all mathematical and logical operations within the CPU • Operands not in the CPU would have to be retrieved from main memory
CPU-Memory Coordination • Bus - a group of wires that connect separate components • Types of bus: • Control bus (control signals) • Address bus (address information) • Data bus (instruction/data)
CPU-Memory Coordination • The different busses facilitate communication between the CPU and main memory • Actions of the two components are highly-synchronized to ensure efficient and timely execution of instructions
CPU Operations • Instructions do not reside in the CPU, they have to be fetched from memory • Each machine level instruction is broken down into a logical sequence of smaller steps
CPU Operations • Instructions are carried out by performing one or more of the following functions in some pre-specified sequence • Retrieving data from main memory • Putting data to main memory • Register data transfer • ALU operation
How An Instruction is Processed • Instruction is retrieved from memory • Analyze what the instruction is and how to execute it • Operands/parameters (if any) are fetched from main memory • Instruction is executed • Results are stored (CPU or MM) • Prepare for next instruction
Instruction Processing Example • Fetch instruction from memory • Decode it (turns out to be an ADD) • Get the two numbers to add from MM • Perform the addition • Where will it be stored? • Prepare for next instruction
Processing Data in Clusters • Information is organized into groups of fixed-size data that can be stored and retrieved in a single, basic operation • Each group of n bits is referred to as a word of information • Access to each word requires a distinct name (location/address) • Can also refer to a characteristic of other components (i.e. size of bus)