370 likes | 392 Views
Computer Architecture Virtual Memory (VM) – x86. By Dan Tsafrir, 30/5/2011 Presentation based on slides by Lihu Rappoport. http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning). Reminder: VM motivation. VM provides Illusion of large memory Illusion of contiguity
E N D
Computer ArchitectureVirtual Memory (VM) – x86 By Dan Tsafrir, 30/5/2011Presentation based on slides by Lihu Rappoport
Reminder: VM motivation • VM provides • Illusion of large memory • Illusion of contiguity • Ability to overcommitment • Process isolation
Reminder: page table translates VA=>PA Page Table points to memory frame or disk address Virtual page number Physical Memory Valid 1 1 1 1 0 1 1 0 Disk 1 1 0 1 Think of it as a hash tablethat maps VA to PA
TLB Valid Tag Physical Page Virtual page number 1 Physical Memory 1 1 1 0 1 Page Table Valid 1 1 Disk 1 1 0 1 1 0 Physical Page Or Disk Address 1 1 0 1 Reminder: TLB accelerates translation TLB is a VA => PAcache
Reminder: VM concepts • A page can be • Not yet loaded • Loaded • On disk • A loaded page can be • Dirty • Clean • When a page is not loaded (P bit clear) page fault occurs • It may require throwing a loaded page to insert the new one • OS prioritize throwing by LRU and dirty/clean/avail bits • Dirty page should be written to Disk. Clean need not. • New page is either loaded from disk or “initialized” • CPU will set page “access” flag when accessed, “dirty” when written
Goal • In the context of x86… • Provide a method to map • From virtual address (used by program) • To: physical address • Method should be efficient • Can generally be exercised by HW alone • Typically no SW involvement
Hierarchical translation • x86 supports 4KB & 4MB pages • Q: why would we want a 4MB (called “super-page”)? • A: TLB is small… • Page directory • Each process has its own page-directory (but threads share) • CR3 points to p-d of current process • Holds 1024 PDEs (page-directory entries), each is 32 bits • Each PDE contains a PS (“page size”) flag • PS=1: PDE points directly to a 4MB (super)page • PS=0: PDE points to “page table” whose entries point to 4KB pages • Page table • Holds 1024 PTEs (page-table entries), each is 32 bits • Each PTE points to a 4KB page in physical memory
Mapping only 4KB pages (typical) • 2-level hierarchy • All pages are 4KB aligned • Total of 220 (=1M) 4KB pages = 4GB • DIR (10 bits) • Point to PDE in page directory • We assume all PDEs have PS=0 • => Each PDE provides 20bit of 4KB-aligned base physical address of a 4KB page table (no superpaging) • TABLE (10 bits) • Point to PTE in page table • PTE provides a 20 bit, 4KB-aligned base physical address of a 4KB page • OFFSET (12 bits) • Offset within the selected 4KB page 32bit linear address 31 21 11 0 DIR TABLE OFFSET 10 10 12 4K Page data 4KB 1K-PTEpage table 4KB 1K-PDE page directory PTE 20 PDE 20 20+12=32 (4K aligned) CR3 (PDBR)
Mapping only 4MB pages • 1-level hierarchy • All pages are 4MB aligned • Total of 210 (=1K) 4KB pages = 4GB • DIR (10 bits): • Point to PDE in page directory • We assume all PDEs have PS=1 • => Each PDE provides 10bit of 4MB-aligned base physical address of a 4MB page table (no superpaging) • TABLE (10 bits) • None! (moved to offset) • OFFSET (22 bits) • Offset within the selected 4MB page • Fine print • Must set PSE flag in CR4 for 4MB support to work • Otherwise, PS=1 flag settings ignored 32bit linear address 31 21 0 DIR OFFSET 4MB Page 10 22 data 4KB 1K-PDE page directory PDE 10 20+12=32 (4K aligned) CR3 (PDBR)
Mixing 4KB & 4MB pages • Works “out of the box” • When CR3.PSE=1 • Alignment constraints: 4MB for superpages, 4KB for regular pages • TLB issues? • No, as CPU maintains 4MB and 4KB PTEs in separate TLBs • Benefits • Superpages often used for often-used kernel code • Frees up 4KB TLB entries • Reduces TLB misses => improve overall system performance
PDE & PTE format Present Writable User Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Available for OS Use • 20 bit physical address • 4K-aligned pointer • 12 bits flags • Virtual memory • Present, accessed, dirty • Protection • Read, write, user, privileged • Caching • WB, WT, disable • 3 bit for OS usage Page Frame Address 31:12 AVAIL 0 0 0 A PCD PWT U W P Page Dir Entry - 31 12 11 9 8 7 6 5 4 3 2 1 0 Present Writable User Write-Through Cache Disable Accessed Dirty Available for OS Use Page Frame Address 31:12 AVAIL 0 0 D A PCD PWT U W P Page Table Entry - 31 12 11 9 8 7 6 5 4 3 2 1 0 Reserved for future use (should be zero)
4KB-page PTE format Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Table Attribute Index Global Page Available for OS Use Page Base Address 31:12 AVAIL G P A T D A PCD PWT U/S R/W P - 31 12 11 9 8 7 6 5 4 3 2 1 0
4KB-page PDE format Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Size (0 indicates 4 Kbytes) Global Page (ignored) Available for OS Use Page Table Base Address 31:12 AVAIL G P S A V L A PCD PWT U/S R/W P - 31 12 11 9 8 7 6 5 4 3 2 1 0
4MB-page PDE format Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Size (1 indicates 4 Mbytes) Global Page (ignored) Available for OS Use Page Table Attribute Index Page BaseAddress 31:22 AVAIL G P S D A PCD PWT U/S R/W P P A T Reserved - 12 13 11 9 8 7 6 5 4 3 2 1 0 22 21 31
VM attributes: present flag (P) • Set => page in physical memory • Translation is carried out by the MMU (memory management unit) • Clear => page not in physical memory • When encounters by MMU => generates a page-fault exception • Faulting address is available to SW exception handler • MMU does not set/clear this flag (only reads it) • It’s up to the OS • Upon page-fault exception => OS typically does the following: • Copy page from disk to memory (unless already in buffer cache) • Update PTE/PDE with page RAM address • P = 1; dirty = accessed = 0; etc. • Invalidate associated PTE in TLB • Resume program on faulty instruction
VM attributes: page size flag (PS) • In PDEs only • Determines the page size • Clear => page size = 4KB (& PDE points to a page table) • Set => page size = 4MB (& PDE points to superpage)
VM attributes: accessed (A) & dirty (D) • MMU sets A-flag • Upon first time a page (or page-table) is accessed (load or store) • MMU sets D-flag • Upon first time a page (or PT) is accessed (store only) • A & D are sticky • Once set, MMU (=HW) never clears them • Only SW does • OS clears them • When initially loading PTE • Possibly from time to time as part of LRU approximation (used to decide which pages to swap out and which to keep)
VM attributes: global flag (G) • Has affect only when PGE=1 in CR4 • When set, indicates page is “global” • Not flushed from TLB when CR3 loaded • Ignored for PDEs with PS=0 (that point to page tables) • Used to improve performance • Keeps important pages of OS in TLB across context switches • Only software can set or clear this flag
Cache attributes: PWT • PWT • Means “page-level write-through” • Controls write-through / write-back caching policy of page / PT • 1: enable write-through caching • 0 : disable write-through => enable write-back caching • Ignored if • CD (“cache disable”) flag is set in CR0 • If associated PCD is on
Cache attributes: PCD • PCD • Means “page-level cache disable” flag • Controls caching of individual pages / PTs • 1: caching associated page/PT is prevented • 0: caching allowed • Used • When caching doesn’t help performance (e.g., streaming) • Memory mapped I/O ports to communicate with devices • Assumed as set (regardless of actual value) • If the CD (“cache disable”) flag in CR0 is set
Cache attributes: PAT • PAT • Means “page attribute table index” flag • If on, used along with PCD & PWT flags to select an entry in the PAT • Which in turn selects the memory type for the page • PAT is a 64bit register • (Not going into the details)
Protection attributes : R/W & U/S • Read/write (R/W) flag • Specifies read-write privileges for • page (if PTE), • group of pages (if PDE) • 0 = read only • 1 = read & write • User/supervisor (U/S) flag • Specifies privileges for a page (PTE) or group of pages (PDE)(in case of a PDE that points to a page table) • 0 = supervisor privilege level • 1 = user privilege level • User accessing a supervisor page will trigger an interrupt • Typically resulting in the termination of the program
Misc issues • Memory aliasing/sharing • When two (or more) PDEs point to a common PTE • When two (or more) PTEs point to a common page • But SW must maintain consistency of accessed & dirty bits in the these PDEs & PTEs • Base address of page-directory • Physical address of current p-d is stored in CR3 • Also called the page-directory-base-register (PDBR) • PDBR typically reloaded upon task switches • Page directory must remain in-memory as long as task is active
PAE – Physical Address Extension • 32bit address imposes a limit • Means we can use memory <= 2^32 = 4GB • Too small for many system, • PAE (physical address extension) support • Allows access to a 2^36 RAM (= 64 GB) • But not directly (address remains 32bit) • Only applicable when paging is enabled • When also turning on PAE in CR4 • Support for 4KB and 2MB (rather than 4MB)
PAE – Physical Address Extension • Relies on an additional Page Directory Pointer Table • Lies above the page directory in the translation hierarchy • Has 4 entries of 64-bits each to support up to 4 page directories • PTEs are increased to 64 bits to accommodate 36-bit base physical addresses • Each 4KB page directory and page table can thus have up to 512 entries • CR3 contains the page-directory-pointer-table base address
4KB Page Mapping with PAE • Linear address divided to • Page-directory-pointer-table entry • Indexed by bits 30:31 of the linear addr. • Provides an offset to one of 4 entries in the page-directory-pointer table • The selected entry provides the base physical address of a page directory • Dir (9 bits) – points to a PDE in the Page Directory • PS in the PDE = 0 PDE provides a 27 bit, 4KB aligned base physical address of a page table • Table (9 bit) – points to a PTE in the Page Table • PTE provides a 24 bit, 4KB aligned base physical address of a 4KB page • Offset (12 bits) – offset within the selected 4KB page Linear Address Space (4K Page) 31 30 21 12 29 20 11 0 DIR TABLE OFFSET Dir ptr 4KBytePage 2 9 9 12 data 512 entryPage Table 512 entry Page Directory PTE 24 4 entryPage Directory PointerTable PDE 27 27 Dir ptr entry 32 (32B aligned) CR3 (PDPTR)
2MB Page Mapping with PAE • Linear address divided to • Page-directory-pointer-table entry • Indexed by bits 30:31 of the linear addr. • Provides an offset to one of 4 entries in the page-directory-pointer table • The selected entry provides the base physical address of a page directory • Dir (9 bits) – points to a PDE in the Page Directory • PS in the PDE = 1 PDE provides a 15 bit, 2MB aligned base physical address of a 2MB page • Offset (21 bits) – offset within the selected 2MB page Linear Address Space (2MB Page) 31 30 21 29 20 0 DIR OFFSET Dir ptr 2MBytePage 2 9 21 Page Directory data 15 27 PDE Page Directory PointerTable Dir ptr entry 32 (32B aligned) CR3 (PDPTR)
PTE/PDE/PDP Entry Format with PAE • The major differences in these entries are as follows: • A page-directory-pointer-table entry is added • The size of the entries is increased from 32 bits to 64 bits • The maximum number of entries in a page directory or page table is 512 • The base physical address field in each entry is extended to 24 bits
Paging in 64 bit Mode • PAE paging structures expanded • Potentially support mapping a 64-bit linear address to a 52-bit physical address • First implementation supports mapping a 48-bit linear address into a 40-bit physical address • A 4th page mapping table added: the page map level 4 table (PML4) • The base physical address of the PML4 is stored in CR3 • A PML4 entry contains the base physical address a page directory pointer table • The page directory pointer table is expanded to 512 8-byte entries • Indexed by 9 bits of the linear address • The size of the PDE/PTE tables remains 512 eight-byte entries • each indexed by nine linear-address bits • The total of linear-address index bits becomes 48 • PS flag in PDEs selects between 4-KByte and 2-MByte page sizes • CR4.PSE bit is ignored
4KB Page Mapping in 64 bit Mode Linear Address Space (4K Page) 63 47 39 38 30 21 12 29 20 11 0 sign ext. PML4 PDP DIR TABLE OFFSET 4KBytePage 9 9 9 12 9 512 entryPage Table 512 entryPage Directory PointerTable data 512 entry Page Directory 512 entryPML4Table 28 PTE PDE 31 PDP entry 31 PML4 entry 31 40 (4KB aligned) CR3 (PDPTR)
2MB Page Mapping in 64 bit Mode Linear Address Space (2M Page) 63 47 39 38 30 21 29 20 0 sign ext. PML4 PDP DIR OFFSET 9 9 21 9 2MBytePage 512 entryPage Directory PointerTable data 512 entry Page Directory 512 entryPML4Table PDE 19 PDP entry 31 PML4 entry 31 40 (4KB aligned) CR3 (PDPTR)
TLBs • The processor saves most recently used PDEs and PTEs in TLBs • Separate TLB for data and instruction caches • Separate TLBs for 4-KByte and 2/4-MByte page sizes • OS running at privilege level 0 can invalidate TLB entries • INVLPG instruction invalidates a specific PTE in the TLB • This instruction ignores the setting of the G flag • Whenever a PDE/PTE is changed (including when the present flag is set to zero), OS must invalidate the corresponding TLB entry • All (non-global) TLBs are automatically invalidated when CR3 is loaded • The global (G) flag prevents frequently used pages from being automatically invalidated in on a task switch • The entry remains in the TLB indefinitely • Only INVLPG can invalidate a global page entry