520 likes | 699 Views
Operating Systems & Memory Systems: Address Translation. CPS 220 Professor Alvin R. Lebeck Fall 2001. Outline. Address Translation basics 64-bit Address Space Managing memory OS Performance Throughout Review Computer Architecture Interaction with Architectural Decisions.
E N D
Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001
Outline • Address Translation • basics • 64-bit Address Space • Managing memory • OS Performance Throughout • Review Computer Architecture • Interaction with Architectural Decisions CPS 220
System Organization interrupts Processor Cache Core Chip Set I/O Bus Main Memory Disk Controller Graphics Controller Network Interface Graphics Disk Disk Network CPS 220
Applications Software Operating System Compiler This is IT CPU Memory I/O Hardware Multiprocessor Networks Computer Architecture • Interface Between Hardware and Software CPS 220
Memory Hierarchy 101 Very fast 1ns clock Multiple Instructions per cycle P $ SRAM, Fast, Small Expensive DRAM, Slow, Big,Cheap (called physical or main) Memory Magnetic, Really Slow, Really Big, Really Cheap => Cost Effective Memory System (Price/Performance) CPS 220
Virtual Memory: Motivation Virtual • Process = Address Space + thread(s) of control • Address space = PA • programmer controls movement from disk • protection? • relocation? • Linear Address space • larger than physical address space • 32, 64 bits v.s. 28-bit physical (256MB) • Automatic management Physical CPS 220
Virtual Memory • Process = virtual address space + thread(s) of control • Translation • VA -> PA • What physical address does virtual address A map to • Is VA in physical memory? • Protection (access control) • Do you have permission to access it? CPS 220
Virtual Memory: Questions • How is data found if it is in physical memory? • Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped • What data should be replaced on a miss? (Take CPS210 …) CPS 220
Segmented Virtual Memory • Virtual address (232,264) to Physical Address mapping (230) • Variable size, base + offset, contiguous in both VA and PA Virtual Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000 0x11000 CPS 220
Intel Pentium Segmentation Physical Address Space Logical Address Offset Seg Selector Global Descriptor Table (GDT) Segment Descriptor Segment Base Address CPS 220
Pentium Segmention (Continued) • Segment Descriptors • Local and Global • base, limit, access rights • Can define many • Segment Registers • contain segment descriptors (faster than load from mem) • Only 6 • Must load segment register with a valid entry before segment can be accessed • generally managed by compiler, linker, not programmer CPS 220
Offset Virtual page number Virtual Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000 0x11000 Paged Virtual Memory • Virtual address (232,264) to Physical Address mapping (228) • virtual page to physical page frame • Fixed Size units for access control & translation CPS 220
Page Table • Kernel data structure (per process) • Page Table Entry (PTE) • VA -> PA translations (if none page fault) • access rights (Read, Write, Execute, User/Kernel, cached/uncached) • reference, dirty bits • Many designs • Linear, Forward mapped, Inverted, Hashed, Clustered • Design Issues • support for aliasing (multiple VA to single PA) • large virtual address space • time to obtain translation CPS 220
L1 L2 L3 PO 21 seg 0/1 10 10 10 13 base + + + phys page frame number Alpha VM Mapping (Forward Mapped) • “64-bit” address divided into 3 segments • seg0 (bit 63=0) user code/heap • seg1 (bit 63 = 1, 62 = 1) user stack • kseg (bit 63 = 1, 62 = 0) kernel segment for OS • Three level page table, each one page • Alpha 21064 only 43 unique bits of VA • (future min page size up to 64KB => 55 bits of VA) • PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) • What do you do for replacement? CPS 220
Inverted Page Table (HP, IBM) • One PTE per page frame • only one VA per physical frame • Must search for virtual address • More difficult to support aliasing • Force all sharing to use the same VA Virtual page number Offset Inverted Page Table (IPT) Hash VA PA,ST Hash Anchor Table (HAT) CPS 220
Dir Table Offset Intel Pentium Segmentation + Paging Physical Address Space Logical Address Linear Address Space Offset Seg Selector Page Table Global Descriptor Table (GDT) Page Dir Segment Descriptor Segment Base Address CPS 220
The Memory Management Unit (MMU) • Input • virtual address • Output • physical address • access violation (exception, interrupts the processor) • Access Violations • not present • user v.s. kernel • write • read • execute CPS 220
Translation Lookaside Buffers (TLB) • Need to perform address translation on every memory reference • 30% of instructions are memory references • 4-way superscalar processor • at least one memory reference per cycle • Make Common Case Fast, others correct • Throw HW at the problem • Cache PTEs CPS 220
Page Number Page offset phys frame v r w tag 1 2 4 . . . . . . . . . . . . 3 48 48:1 mux Fast Translation: Translation Buffer • Cache of translated addresses • Alpha 21164 TLB: 48 entry fully associative CPS 220
TLB Design • Must be fast, not increase critical path • Must achieve high hit ratio • Generally small highly associative • Mapping change • page removed from physical memory • processor must invalidate the TLB entry • PTE is per process entity • Multiple processes with same virtual addresses • Context Switches? • Flush TLB • Add ASID (PID) • part of processor state, must be set on context switch CPS 220
Hardware Managed TLBs • Hardware Handles TLB miss • Dictates page table organization • Compilicated state machine to “walk page table” • Multiple levels for forward mapped • Linked list for inverted • Exception only if access violation CPU TLB Control Memory CPS 220
Software Managed TLBs • Software Handles TLB miss • Flexible page table organization • Simple Hardware to detect Hit or Miss • Exception if TLB miss or access violation • Should you check for access violation on TLB miss? CPU TLB Control Memory CPS 220
Mapping the Kernel 264-1 User Stack Physical Memory • Digital Unix Kseg • kseg (bit 63 = 1, 62 = 0) • Kernel has direct access to physical memory • One VA->PA mapping for entire Kernel • Lock (pin) TLB entry • or special HW detection Kernel Kernel User Code/ Data 0 CPS 220
Considerations for Address Translation Large virtual address space • Can map more things • files • frame buffers • network interfaces • memory from another workstation • Sparse use of address space • Page Table Design • space • less locality => TLB misses OS structure • microkernel => more TLB misses CPS 220
Address Translation for Large Address Spaces • Forward Mapped Page Table • grows with virtual address space • worst case 100% overhead not likely • TLB miss time: memory reference for each level • Inverted Page Table • grows with physical address space • independent of virtual address space usage • TLB miss time: memory reference to HAT, IPT, list search CPS 220
Virtual page number Offset Hash Hashed Page Table (HPT) VA PA,ST Hashed Page Table (HP) • Combine Hash Table and IPT [Huck96] • can have more entries than physical page frames • Must search for virtual address • Easier to support aliasing than IPT • Space • grows with physical space • TLB miss • one less memory ref than IPT CPS 220
VPBN VPBN VPBN next next next PA0 attrib PA0 attrib PA0 attrib Clustered Page Table (SUN) • Combine benefits of HPT and Linear [Talluri95] • Store one base VPN (TAG) and several PPN values • virtual page block number (VPBN) • block offset VPBN Boff Offset Hash ... VPBN next PA0 attrib PA1 attrib PA2 attrib PA3 attrib ... CPS 220
Reducing TLB Miss Handling Time • Problem • must walk Page Table on TLB miss • usually incur cache misses • big problem for IPC in microkernels • Solution • build a small second-level cache in SW • on TLB miss, first check SW cache • use simple shift and mask index to hash table CPS 220
Next Time • More TLB issues • Virtual Memory & Caches • Multiprocessor Issues CPS 220
Operating Systems & Memory Systems: Managing the Memory System CPS 220 Professor Alvin R. Lebeck
Review: Address Translation • Map from virtual address to physical address • Page Tables, PTE • va->pa, attributes • forward mapped, inverted, hashed, clustered • Translation Lookaside Buffer • hardware cache of most recent va->pa translation • misses handled in hardware or software • Implications of larger address space • page table size • possibly more TLB misses • OS Structure • microkernels -> lots of IPC -> more TLB misses CPS 220
2 3 2 3 2 3 0 1 0 1 0 1 2 3 0 1 7 Cache Memory 102 • Block 7 placed in 4 block cache: • Fully associative, direct mapped, 2-way set associative • S.A. Mapping = Block Number Modulo Number Sets • DM = 1-way Set Assoc • Cache Frame • location in cache • Bit-selection DM 7 mod 4 SA 7 mod 2 FA Set 1 Set 0 Main Memory CPS 220
Block Address TAG Index Block offset Cache Indexing • Tag on each block • No need to check index or block offset • Increasing associativity shrinks index, expands tag Fully Associative: No index Direct-Mapped: Large index CPS 220
Address Translation and Caches • Where is the TLB wrt the cache? • What are the consequences? • Most of today’s systems have more than 1 cache • Digital 21164 has 3 levels • 2 levels on chip (8KB-data,8KB-inst,96KB-unified) • one level off chip (2-4MB) • Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping CPS 220
TLBs and Caches CPU CPU CPU VA VA VA VA Tags $ PA Tags TLB $ TLB VA PA PA L2 $ TLB $ MEM PA PA MEM MEM Overlap $ access with VA translation: requires $ index to remain invariant across translation Conventional Organization Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem CPS 220
Virtual Caches • Send virtual address to cache. Called VirtuallyAddressed Cache or just VirtualCachevs. Physical Cache or Real Cache • Avoid address translation before accessing cache • faster hit time to cache • Context Switches? • Just like the TLB (flush or pid) • Cost is time to flush + “compulsory” misses from empty cache • Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process • I/O must interact with cache CPS 220
I/O and Virtual Caches Virtual Cache interrupts Processor Physical Addresses Cache Memory Bus I/O Bridge • I/O is accomplished • with physical addresses • DMA • flush pages from cache • need pa->va reverse • translation • coherent DMA I/O Bus Main Memory Disk Controller Graphics Controller Network Interface Graphics Disk Disk Network CPS 220
Aliases and Virtual Caches 264-1 User Stack Physical Memory • aliases(sometimes called synonyms); Two different virtual addresses map to same physical address • But, but... the virtual address is used to index the cache • Could have data in two different locations in the cache Kernel Kernel User Code/ Data 0 CPS 220
Page Offset Page Address Address Tag Block Offset Index Index with Physical Portion of Address • If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag • Limits cache to page size: what if want bigger caches and use same trick? • Higher associativity • Page coloring CPS 220
Page Offset Page Address Address Tag Block Offset Index Page Coloring for Aliases • HW that guarantees that every cache frame holds unique physical address • OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame • one form of page coloring CPS 220
Virtual Memory and Physically Indexed Caches Cache Page frames • Notion of bin • region of cache that may contain cache blocks from a page • Random vs careful mapping • Selection of physical page frame dictates cache index • Overall goal is to minimize cache misses CPS 220
Careful Page Mapping [Kessler92, Bershad94] • Select a page frame such that cache conflict misses are reduced • only choose from available pages (no replacement induced) • static • “smart” selection of page frame at page fault time • dynamic • move pages around CPS 220
Page Coloring • Make physical index match virtual index • Behaves like virtual index cache • no conflicts for sequential pages • Possibly many conflicts between processes • address spaces all have same structure (stack, code, heap) • modify to xor PID with address (MIPS used variant of this) • Simple implementation • Pick abitrary page if necessary CPS 220
Bin Hopping • Allocate sequentially mapped pages (time) to sequential bins (space) • Can exploit temporal locality • pages mapped close in time will be accessed close in time • Search from last allocated bin until bin with available page frame • Separate search list per process • Simple implementation CPS 220
Best Bin • Keep track of two counters per bin • used: # of pages allocated to this bin for this address space • free: # of available pages in the system for this bin • Bin selection is based on low values of used and high values of free • Low used value • reduce conflicts within the address space • High free value • reduce conflicts between address spaces CPS 220
Hierarchical • Best bin could be linear in # of bins • Build a tree • internal nodes contain sum of child <used,free> values • Independent of cache size • simply stop at a particular level in the tree CPS 220
Benefit of Static Page Coloring • Reduces cache misses by 10% to 20% • Multiprogramming • want to distribute mapping to avoid inter-address space conflicts CPS 220
Dynamic Page Coloring • Cache Miss Lookaside (CML) buffer [Bershad94] • proposed hardware device • Monitor # of misses per page • If # of misses >> # of cache blocks in page • must be conflict misses • interrupt processor • move a page (recolor) • Cost of moving page << benefit CPS 220
Outline • Page Coloring • Page Size CPS 220
A Case for Large Pages • Page table size is inversely proportional to the page size • memory saved • Fast cache hit time easy when cache <= page size (VA caches); • bigger page makes it feasible as cache size grows • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, • larger page size maps more memory • reduces TLB misses CPS 220