120 likes | 356 Views
Chapter 19 Translation Lookaside Buffer. Chien -Chung Shen CIS, UD cshen@cis.udel.edu. Introduction. H igh performance overheads of paging large amount of mapping information (in memory) e xtra memory access for each virtual address Hardware support
E N D
Chapter 19Translation Lookaside Buffer Chien-Chung Shen CIS, UD cshen@cis.udel.edu
Introduction • High performance overheads of paging • large amount of mapping information (in memory) • extra memory access for each virtual address • Hardware support • translation-lookasidebuffer(TLB) • part of MMU • hardware cache of popular virtual-to-physical address translations • better name would be address-translation cache • Upon each virtual memory reference, hardware first checks TLB to see if the desired translation is held therein; if so, the translation is performed (quickly) without having to consult the page table (which has all translations)
TLB Algorithm VPN = (VirtualAddress & VPN_MASK) >> SHIFT (Success, TlbEntry) = TLB_Lookup(VPN)if (Success == True) // TLB Hit if (CanAccess(TlbEntry.ProtectBits) == True) Offset = VirtualAddress & OFFSET_MASK PhysAddr= (TlbEntry.PFN << SHIFT) | Offset AccessMemory(PhysAddr) else RaiseException(PROTECTION_FAULT) else // TLB Miss PTEAddr= PTBR + (VPN * sizeof(PTE)) PTE = AccessMemory(PTEAddr) if (PTE.Valid == False) RaiseException(SEGMENTATION_FAULT) else if (CanAccess(PTE.ProtectBits) == False) RaiseException(PROTECTION_FAULT) else TLB_Insert(VPN, PTE.PFN, PTE.ProtectBits) RetryInstruction()
Example: Access Array • 8-bit virtual address space and 16-byte pages • 10 4-byte integers starting at VA 100 • 4-bit VPN and 4-bit offset int sum = 0;for (i = 0; i < 10; i++) { sum += a[i]; } • TLB hit rate: 70% • Spatial locality • Any other way to improve hit rate? • larger pages • Quick re-reference of memory in time • temporal locality
Caching and Locality • Caching is one of the most fundamental performance techniques in computer systemsto make common-case faster • Idea behind caching is to take advantage of locality in instruction and data references • Temporal locality: an instruction or data item that has been recently accessed will likely be re-accessed soon in the future (e.g., instructions in a loop) • Spatiallocality: if program accesses memory x, it will likely soon access memory near x
Who handles TLB Misses • For CISC(complex-instruction set computers) architecture, by hardware • using page-table base register • For RISC (reduced-instruction set computers) architecture, by software(where hardware simply raises an exception and jumps to a trap handler) • advantage: flexibility (OS may use any data structure to implement page table) and simplicity • return-from-trap returns to the same instruction that caused the trap • avoid causing an infinite chain of TLB misses • keep TLB miss handlers in physical memory (not subject to address translation) • reserve some entries in TLB for permanently-valid translations and use some of those permanent translation slots for the handler code itself
TLB Contents • 32, 64, or 128 entries • Fully associative: any given translation can be anywhere in TLB, and hardware will search the entire TLB in parallel to find the desired translation • An entry looks like: VPN | PFN | other bits • e.g., valid bit • TLB valid bit ≠ page table valid bit • in page table, when a PTE is marked invalid, it means that the page has not been allocated by the process • aTLB valid bit refers to whether a TLB entry has a valid translation within it
Context Switch • TLB contains virtual-to-physical translations that are only valid for the currently running process, which are not meaningful for other processes • What to do on a context switch? • flush TLB on context switches by sets all valid bits to 0 • Incur TLB misses after context switches: what can you do better? VPN PFN validprotASID (Address Space ID) 10 100 1 rwx 1 — — 0 — — 10 170 1 rwx 2 — — 0 — — • With ASID, TLB mayholdtranslations from differentprocesses VPN PFN validprotASID 10 101 1 rwx 1 — — 0 — — 50 101 1 rwx 2 — — 0 — — • Sharing of page
Replacement Policy • Cache replacement with goal of minimizing miss rate • Policies • evict the least-recently-used (LRU) entry • how about a loop accessing n + 1 pages, a TLB of size n, and an LRU replacement policy ? • random
A Real TLB Entry • MIPS R4000 with software-managed TLB
Culler’s Law • The term random-access memory(RAM)implies that you can access any part of RAM just as quickly as another. While it is generally good to think of RAM in this way, because of hardware/OS features such as TLB, accessing a particular page of memory may be costly, particularly if that page isn’t currently mapped by TLB. Thus, it is always good to remember the implementation tip: RAM isn’t always RAM. Sometimes randomly accessing your address space, particular if the number of pages accessed exceeds the TLB coverage, can lead to severe performance penalties. -- David Culler • TLB is the source of many performance problems