310 likes | 425 Views
Virtual Memory. Outline. Pentium/Linux Memory System Core i7 Suggested reading: 9.7. P6 Memory System. 32 bit address space 4 KB page size L1, L2, and TLBs 4-way set associative inst TLB 32 entries 8 sets data TLB 64 entries 16 sets L1 i-cache and d-cache 16 KB
E N D
Outline • Pentium/Linux Memory System • Core i7 • Suggested reading: 9.7
P6 Memory System • 32 bit address space • 4 KB page size • L1, L2, and TLBs • 4-way set associative • inst TLB • 32 entries • 8 sets • data TLB • 64 entries • 16 sets • L1 i-cache and d-cache • 16 KB • 32 B line size • 128 sets • L2 cache • unified • 128 KB -- 2 MB • 32 B line size DRAM external system bus (e.g. PCI) L2 cache cache bus bus interface unit inst TLB data TLB instruction fetch unit L1 i-cache L1 d-cache processor package
Review of Address Translation • Basic Parameters • N = 2n= Virtual address limit • M = 2m = Physical address limit • P = 2p = page size (bytes). • Components of the virtual address (VA) • VPO: Virtual page offset • VPN: Virtual page number • Components of the physical address (PA) • PPO: Physical page offset (same as VPO) • PPN: Physical page number
Review of Address Translation n–1 p p–1 0 virtual address virtual page number page offset TLB tag TLB index address translation m–1 p p–1 0 physical address physical page number page offset Cache tag Cache index Cache offset
Review of Address Translation • Components of the virtual address (VA) • VPO: Virtual page offset • VPN: Virtual page number • TLBI: TLB index • TLBT: TLB tag • Components of the physical address (PA) • PPO: Physical page offset (same as VPO) • PPN: Physical page number • CO: Byte offset within cache line • CI: Cache index • CT: Cache tag
CPU 32 L2 and DRAM result 20 12 virtual address (VA) VPN VPO L1 miss L1 hit 16 4 TLBT TLBI L1 (128 sets, 4 lines/set) TLB hit TLB miss ... ... TLB (16 sets, 4 entries/set) 10 10 VPN1 VPN2 20 7 5 20 12 CT CI CO PPN PPO physical address (PA) PDE PTE Page tables PDBR P6 Address Translation
P6 Page Table Up to 1024 page tables • Page directory • 1024 4-byte page directory entries (PDEs) that point to page tables • one page directory per process. • page directory must be in memory when its process is running • always pointed to by PDBR • Page tables: • 1024 4-byte page table entries (PTEs) that point to pages. • page tables can be paged in and out. 1024 PTEs page directory ... 1024 PDEs 1024 PTEs ... 1024 PTEs
P6 page directory entry (PDE) 31 12 11 9 8 7 6 5 4 3 2 1 0 Page table physical base addr Avail G PS A CD WT U/S R/W P=1 Page table physical base address: 20 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: available for system programmers G: global page (don’t evict from TLB on task switch) PS: page size 4K (0) or 4M (1) A: accessed (set by MMU on reads and writes, cleared by software) CD: cache disabled (1) or enabled (0) WT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory (1) or not (0) 31 1 0 Available for OS (page table location in secondary storage) P=0
P6 page table entry (PTE) 31 12 11 9 8 7 6 5 4 3 2 1 0 Page physical base address Avail G 0 D A CD WT U/S R/W P=1 Page base address: 20 most significant bits of physical page address (forces pages to be 4 KB aligned) Avail: available for system programmers G: global page (don’t evict from TLB on task switch) D: dirty (set by MMU on writes) A: accessed (set by MMU on reads and writes) CD: cache disabled or enabled WT: write-through or write-back cache policy for this page U/S: user/supervisor R/W: read/write P: page is present in physical memory (1) or not (0) 31 1 0 Available for OS (page location in secondary storage) P=0
Page tables Translation 10 10 12 Virtual address VPN1 VPN2 VPO word offset into page directory word offset into page table word offset into physical and virtual page page directory page table physical address of page base (if P=1) PTE PDE PDBR physical address of page table base (if P=1) physical address of page directory 20 12 Physical address PPN PPO
CPU 32 L2 and DRAM result 20 12 virtual address (VA) VPN VPO L1 miss L1 hit 16 4 TLBT TLBI L1 (128 sets, 4 lines/set) TLB hit TLB miss ... ... TLB (16 sets, 4 entries/set) 10 10 VPN1 VPN2 20 7 5 20 12 CT CI CO PPN PPO physical address (PA) PDE PTE Page tables PDBR P6 TLB Translation
32 16 1 1 PDE/PTE Tag PD V set 0 entry entry entry entry set 1 entry entry entry entry set 2 entry entry entry entry ... set 15 entry entry entry entry P6 TLB • TLB entry (not all documented, so this is speculative): • V: indicates a valid (1) or invalid (0) TLB entry • PD: is this entry a PDE (1) or a PTE (0)? • tag: disambiguates entries cached in the same set • PDE/PTE: page directory or page table entry • Structure of the data TLB: • 16 sets, 4 entries/set
CPU virtual address 20 12 VPN VPO 16 4 TLBT TLBI 1 2 TLB hit TLB miss PDE PTE 3 ... 20 12 PPN PPO physical address page table translation 4 Translating with the P6 TLB 1. Partition VPN into TLBT and TLBI. 2. Is the PTE for VPN cached in set TLBI? 3. Yes: then build physical address. 4. No: then read PTE (and PDE if not cached) from memory and build physical address.
CPU 32 L2 and DRAM result 20 12 virtual address (VA) VPN VPO L1 miss L1 hit 16 4 TLBT TLBI L1 (128 sets, 4 lines/set) TLB hit TLB miss ... ... TLB (16 sets, 4 entries/set) 10 10 VPN1 VPN2 20 7 5 20 12 CT CI CO PPN PPO physical address (PA) PDE PTE Page tables PDBR P6 Page Table Translation
Translating with the P6 page tables (case 1/1) • Case 1/1: page table and page present. • MMU Action: • MMU build physical address and fetch data word. • OS action • none 20 12 VPN VPO 20 12 VPN1 VPN2 PPN PPO Mem PDE p=1 PTE p=1 data PDBR Data page Page directory Page table Disk
Translating with the P6 page tables (case 1/0) • Case 1/0: page table present but page missing. • MMU Action: • page fault exception • handler receives the following args: • VA that caused fault • fault caused by non-present page or page-level protection violation • read/write • user/supervisor 20 12 VPN VPO VPN1 VPN2 Mem PDE p=1 PTE p=0 PDBR Page directory Page table data Disk Data page
Translating with the P6 page tables (case 1/0, cont) • OS Action: • Check for a legal virtual address. • Read PTE through PDE. • Find free physical page (swapping out current page if necessary) • Read virtual page from disk and copy to virtual page • Restart faulting instruction by returning from exception handler. 20 12 VPN VPO 20 12 VPN1 VPN2 PPN PPO Mem PDE p=1 PTE p=1 data PDBR Data page Page directory Page table Disk
Translating with the P6 page tables (case 0/1) • Case 0/1: page table missing but page present. • Introduces consistency issue. • potentially every page out requires update of disk page table. • Linux disallows this • if a page table is swapped out, then swap out its data pages too. 20 12 VPN VPO VPN1 VPN2 Mem PDE p=0 data PDBR Data page Page directory PTE p=1 Disk Page table
Translating with the P6 page tables (case 0/0) • Case 0/0: page table and page missing. • MMU Action: • page fault exception 20 12 VPN VPO VPN1 VPN2 Mem PDE p=0 PDBR Page directory PTE p=0 data Disk Page table Data page
Translating with the P6 page tables (case 0/0, cont) • OS action: • swap in page table. • restart faulting instruction by returning from handler. • Like case 0/1 from here on. 20 12 VPN VPO VPN1 VPN2 Mem PDE p=1 PTE p=0 PDBR Page table Page directory data Disk Data page
CPU 32 L2 and DRAM result 20 12 virtual address (VA) VPN VPO L1 miss L1 hit 16 4 TLBT TLBI L1 (128 sets, 4 lines/set) TLB hit TLB miss ... ... TLB (16 sets, 4 entries/set) 10 10 VPN1 VPN2 20 7 5 20 12 CT CI CO PPN PPO physical address (PA) PDE PTE Page tables PDBR P6 L1 Cache Access
32 L2 andDRAM data L1 miss L1 hit L1 (128 sets, 4 lines/set) ... 20 7 5 CT CI CO physical address (PA) L1 cache access • Partition physical address into CO, CI, and CT. • Use CT to determine if line containing word at address PA is cached in set CI. • If no: check L2. • If yes: extract word at byte offset CO and return to processor.
Core i7 Summery • Core i7 • The 64-bit Nehalem microarchitecture • Current support 48-bit (256 TB) virtual address space and 52-bit (4 PB) physical address space • Compatibility mode support 32-bit (4 GB) virtual and physical address spaces
Core i7 Memory System Registers Instruction fetch MMU(addr translation) L1 d-TLB 64 entries, 4-way L1 d-cache 32 KB,8-way L1 i-TLB 64 entries, 4-way L1 i-cache 32 KB,8-way L2 unified TLB 512 entries, 4-way Processor package L2 unified cache256 KB, 8-way To othercores QuickPath interconnect4 links @ 25.6 GB/s102.4 GB/s total To I/Obridge L3 unified cache8 MB, 16-way(shared by all cores) DDR3 memory controller3 x 64 bit @ 10.66 GB/s32 GB/s total (shared by all cores) Core x 4 Main memory
32/64 CPU L2, L3, and main memory result Virtual address (VA) 36 12 VPN VPO L1 hit L1 miss 32 4 TLBT TLBI L1 d-cache(64 sets, 8 lines/set) TLB hit TLBmiss ... L1 TLB (16 sets, 4 entries/set) 40 6 6 40 12 PPN PPO CT physical address (PA) Core i7 Address Translation ... 9 9 9 9 VPN1 VPN2 VPN3 VPN4 CI CO CR3 PTE PTE PTE PTE Page Tables
Level 1, Level 2 and Level 3 Page Table Entry 63 62 52 51 12 11 9 8 7 6 5 4 3 2 1 0 XD Unused Page table physical base addr Unused G PS A CD WT U/S R/W P=1 Available for OS (page table location in secondary storage) P=0 XD Disable or enable instruction fetches Base addr40 most significant bits of base address of child page table (forces page tables to be 4KB aligned) G global page (don’t evict from TLB on task switch) PS Page size either 4K(0) or 4M(1) (defined for Level 1 PTEs only) A Reference bit (set by MMU on reads and writes, cleared by software) CD Cache disabled(1) or enabled(0) for child page table WT Write-through or write-back cache policy U/S User or supervisor(kernel) mode access permission R/W Read-only or read-write access permissiom P Child page table present in memory(1) or not(0)
Level 4 Page Table Entry 63 62 52 51 12 11 9 8 7 6 5 4 3 2 1 0 XD Unused Page table physical base addr Unused G D A CD WT U/S R/W P=1 Available for OS (page table location in secondary storage) P=0 XD Disable or enable instruction fetches Base addr40 most significant bits of base address of child page table (forces page tables to be 4KB aligned) G global page (don’t evict from TLB on task switch) D Dirty bit (Set by MMU on writes, cleared by software) A Reference bit (set by MMU on reads and writes, cleared by software) CD Cache disabled(1) or enabled(0) for child page table WT Write-through or write-back cache policy U/S User or supervisor(kernel) mode access permission R/W Read-only or read-write access permissiom P Child page table present in memory(1) or not(0)
Page tables Translation 9 9 9 9 12 VPN 1 VPN 2 VPN 3 VPN 4 VPO Virtual address L1 PTPage global directory L2 PTPage upper directory L3 PTPage middle directory L4 PTPagedirectory 9 9 9 9 CR3 Physical addressof L1 PT 40 40 40 40 Offset into physical and virtual page L1 PTE L1 PTE L1 PTE L1 PTE physical address of page 12 512 GBregionper entry 1 GBregionper entry 2 MBregionper entry 4 KBregionper entry 40 Physical address 40 12 PPO PPO
Cute Trick for Speeding Up L1 Access CT Tag Check • Observation • Bits that determine CI identical in virtual and physical address • Can index into cache while address translation taking place • Generally we hit in TLB, so PPN bits (CT bits) available next • “Virtually indexed, physically tagged” • Cache carefully sized to make this possible 36 6 6 Physical address (PA) CT CI CO PPN PPO No Change Address Translation Virtual address (VA) CI L1 Cache VPN VPO 36 12
Process-specific data structures(e.g. task and mm structs, page tables, kernel stack) Linux Virtual Memory System Different foreach process Kernel virtual memory Physical memory Identical foreach process Kernel code and data User stack %esp Memory mapped regionfor shared libraries Processvirtualmemory brk Run-time heap (via malloc) Uninitialized data (.bss) Initialized data (.data) Program text (.text) x08048000 (32)x40000000 (64)