1 / 49

Computer Structure X86 Virtual Memory and TLB

Computer Structure X86 Virtual Memory and TLB. Franck Sala Slides from Lihu and Adi’s Lecture. Virtual Memory. Provides the illusion of a large memory Different machines have different amount of physical memory Allows programs to run regardless of actual physical memory size

Download Presentation

Computer Structure X86 Virtual Memory and TLB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer StructureX86 Virtual Memory and TLB Franck Sala Slides from Lihu and Adi’s Lecture

  2. Virtual Memory • Provides the illusion of a large memory • Different machines have different amount of physical memory • Allows programs to run regardless of actual physical memory size • The amount of memory consumed by each process is dynamic • Allow adding memory as needed • Many processes can run on a single machine • Provide each process its own memory space • Prevents a process from accessing the memory of other processes running on the same machine • Allows the sum of memory spaces of all process to be larger than physical memory • Basic terminology • Virtual Address Space: address space used by the programmer • Physical Address: actual physical memory address space

  3. Virtual Memory: Basic Idea • Divide memory (virtual and physical) into fixed size blocks • Pages in Virtual space, Frames in Physical space • Page size = Frame size • Page size is a power of 2: page size = 2k • All pages in the virtual address space are contiguous • Pages can be mapped into physical Frames in any order • Some of the pages are in main memory (DRAM), some of the pages are on disk • All programs are written using Virtual Memory Address Space • The hardware does on-the-fly translationbetween virtual and physical address spaces • Use a Page Tableto translate betweenVirtualand Physical addresses

  4. Virtual to Physical Address translation Virtual Address 47 0 12 11 Virtual Page Number Page offset Phy. page # V D AC Page table base reg Access Control • Memory type (WB, WT, UC, WP …) • User / Supervisor Dirty bit 1 0 Valid bit 0 11 38 12 Page offset Physical Page Number Physical Address Page size: 212 byte=4K byte

  5. Virtual Address Tag Set TLB Access TLB Hit ? Access Page Table In memory No Physical Addresses Yes Offset Virtual page number Translation Look aside Buffer (TLB) • Page table resides in memory  each translation requires an extra memory access • TLB caches recently used PTEs • speed up translation • typically 128 to 256 entries, 4 to 8 way associative • TLB Indexing • On A TLB miss • Page Miss Handler (HW PMH) gets PTE from memory

  6. Access Cache Physical Addresses Virtual Memory And Cache Virtual Address Access TLB Page Walk: get PTE from memory Hierarchy No STLB Hit ? No TLB Hit ? No No L2Cache Hit ? L1Cache Hit ? Access Memory Yes Yes Data • TLB access is serial with cache access • Page table entries are cached in L1 D$, L2$ and L3 $ as data

  7. Glossary • Page Fault • Page is not in main memory • Context switch • Demand load policy • Access control / Protection fault • Context switch • Page Aliasing • DLL, Shared code, large Malloc • Page swap out • snoop-invalidate for each byte in the cache (both L1 and L2) • All data in the caches which belongs to that page is invalidated • The page in the disk is up-to-date • If the TLB hits for the swapped-out page, TLB entry is invalidated • In the page table • The valid bit in the PTE entry of the swapped-out pages set to 0 • All the rest of the bits in the PTE entry may be used by the operating system for keeping the location of the page in the disk

  8. 32bit Mode: 4KB / 4MB Page Mapping • 2-level hierarchical mapping: Page Directories and Page tables • 4KB aligned • PDE • Present (0 = page fault) • Page size (4KB or 4 MB) • CR4.PSE=1  both 4MB & 4KB pages supported • Separate TLBs Linear Address Space (4MB Page) Linear Address Space (4K Page) 31 21 0 DIR OFFSET 31 21 11 0 DIR TABLE OFFSET 4MByte Page 10 22 10 10 12 4K Page data data Page Directory 1K entryPage Table 1K entry Page Directory PTE 20 PDE PDE 10 20 20+12=32 (4K aligned) 20+12=32 (4K aligned) CR3 (PDBR) CR3 (PDBR)

  9. 32bit Mode: PDE and PTE Format Page Directory Entry (4KB page table) Present Writable User / Supervisor Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Global Available for OS Use • 20 bit pointer to a 4K Aligned address • Virtual memory • Present • Accessed • Dirty (in PTE only) • Page size (in PDE only) • Global • Protection • Writable (R#/W) • User / Supervisor # • 2 levels/type only • Caching • Page WT • Page Cache Disabled • PAT – PT Attribute Index • 3 bits available for OS usage Page Frame Address 31:12 AVAIL G 0 A A PCD PWT U W P - 31 12 11 9 8 7 6 5 3 2 1 0 4 Page Table Entry Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty PAT Global Available for OS Use Page Frame Address 31:12 AVAIL G PAT D A PCD PWT U W P - 31 12 11 9 8 7 6 5 4 3 2 1 0

  10. 4KB Page Mapping with PAE • Physical addresses is extended to M bits / Linear address remains 32 bit • 1995: Pentium Pro… Linear Address Space (4K Page) Linear Address Space (4K Page) 31 30 21 12 29 20 11 0 DIR TABLE OFFSET Dir ptr 31 21 11 0 DIR TABLE OFFSET 4KBytePage 2 9 9 12 10 10 12 4K Page data data 512 entryPage Table 512 entry Page Directory 1KentryPage Table 1K entry Page Directory  64 PTE PTE M-12  32   32  20  64 4 entryPage Directory PointerTable PDE PDE M-12 20 M-12 20+12=32 (4K aligned) Dir ptr entry CR3 (PDBR) 32 (32B aligned) CR3 (PDPTR)

  11. 2MB Page Mapping with PAE • Physical addresses is extended to M bits / Linear address remains 32 bit Linear Address Space (4MB Page) Linear Address Space (2MB Page) 31 21 0 DIR OFFSET 31 30 21 29 20 0 DIR OFFSET Dir ptr 4MByte Page 10 22 2MBytePage 2 9 21 data Page Directory data Page Directory  64  32   64  32  M-21 M-12 PDE PDE Page Directory PointerTable 10 20+12=32 (4K aligned) Dir ptr entry CR3 (PDBR) 32 (32B aligned) CR3 (PDPTR)

  12. 4KB Page Mapping in 64 bit Mode2003: AMD Opteron… Linear Address Space (4K Page) PML4: Page Map Level 4 PDP: Page Directory Pointer 63 47 39 38 30 21 12 29 20 11 0 sign ext. PML4 PDP DIR TABLE OFFSET 4KBytePage 9 9 9 12 9 512 entryPage Table 512 entryPage Directory PointerTable data 512 entry Page Directory 512 entryPML4Table  64 M-12 PTE  64  64  64 PDE M-12 PDP entry M-12 PML4 entry M-12 256 TB of virtual memory (248) 1 TB of physical memory (240) 40 (4KB aligned) CR3 (PDPTR)

  13. 2MB Page Mapping in 64 bit Mode Linear Address Space (2M Page) 63 47 39 38 30 21 29 20 0 sign ext. PML4 PDP DIR OFFSET 9 9 21 9 2MBytePage 512 entryPage Directory PointerTable data 512 entry Page Directory 512 entryPML4Table PDE M-21 PDP entry M-12 PML4 entry M-12 40 (4KB aligned) CR3 (PDPTR)

  14. 1GB Page Mapping in 64 bit Mode Linear Address Space (1G Page) 63 47 39 38 30 29 0 sign ext. PML4 PDP OFFSET 9 9 30 1GBytePage 512 entryPage Directory PointerTable data 512 entryPML4Table PDP entry M-30 PML4 entry M-12 40 (4KB aligned) CR3 (PDPTR)

  15. TLBs • The processor saves most recently used PDEs and PTEs in TLBs • Separate TLB for data and instruction caches • Separate TLBs for 4KB and 2/4MB page sizes

  16. Question 1 • We have a core similar to X86 • 64 bit mode • Support Small Pages (PTE) and Large Pages (DIR) • Page table size in each hierarchy is the size of a small page • Entry size in the Page Table is 16 byte, in all the hierarchies sign ext. PML4 PDP DIR TABLE OFFSET N1 11 0 N4 N3 N2 63 12 • What is the size of a small page ? 12 bits in the offset field  212 B = 4KB • How many entries are in each Page Table? Page Table size = Page Size = 4KB PTE = 16B  4KB / 16B = 212 / 24 = 28 = 256 entries in each Page Table

  17. Question 1 • 64 bit (large & small) • PT size = Page size = 4KB • PTE = 16B • Page Table: 256 entries sign ext. PML4 PDP DIR TABLE OFFSET • What are the values of N1, N2, N3 and N4 ? Since we have 256 entries in each table, we need 8 bits to address them • Table [19:12] N1 = 19 • DIR [27:20] N2 = 27 • PDP [35:28] N3 = 35 • PML4 [43:36] N4 = 43 N1 11 0 N4 N3 N2 63 12 • What is the size of a large page ? Large pages are pointed by DIR So the large pages offset is 20 bits [19:0]  large pages size: 220 = 1MB We can also say: DIR can point to 256 pages of 4KB = 1MB

  18. Question 1 • 64 bit (large & small) • PT size = Page size = 4KB • Large page size = 1 MB • PTE = 16B • Page Table: 256 entries sign ext. PML4 PDP DIR TABLE OFFSET • We access a sequence of virtual addressesFor each address, what is the minimal number of tables that were added in all the hierarchies ? • See next foil in presentation mode… 19 11 0 43 35 27 63 12

  19. Question 1: sequence of allocations 27 19 11 0 63 43 36 35 28 20 12 sign ext. PML4 PDP DIR TABLE OFFSET 8 bits instead of 9 for example purposes only 4KBPage 8 8 8 8 12 PML4PDPDIRPTE offset B25 00000D382FFE2B25 Page Table 3 new tables + 1 page Page Dir 349 E2 00000D382FFE2349 PDPTable FF PML4Table 0 new table 68 82 937 00000D382FF68937 D3 28 0 new table + 1 page 71 CR3 C00 46 00000D3822849C00 49 1 new tables + 1 page B5 54 622 000007146B554622 3 new tables + 1 page

  20. Caches and Translation Structures Core Platform On-die Instruction bytes L3 Memory L2 L1 Inst. cache L1 data cache translation Inst. TLB Data. TLB Load entry PTE PTE PMH PTE STLB Page Walk Logic VA[47:12] PDE cache PDE entry VA[47:21] PDP cache PDP entry VA[47:30] PML4 cache PML4 entry VA[47:39]

  21. Page Walk 48 39 0 30 12 21 SIGN EXT. PML4 PDP DIR TABLE OFFSET

  22. Question 2 • Processor similar to X86 – 64 bits • Pages of 4KB • The processor has a TLB • TLB Hit: we get the translation with no need to access the translation tables • TLB Miss: the processor has to do a Page Walk • The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables • All Caches and TLB are empty on Reset • For the sequence of memory access below, how many accesses are needed for the translations? sign ext. PML4 PDP DIR TABLE OFFSET 19 11 0 43 35 27 63 12

  23. Question 2 • Processor similar to X86 – 64 bits • Pages of 4KB • The processor has a TLB • TLB Hit: we get the translation with no need to access the translation tables • TLB Miss: the processor has to do a Page Walk • The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables • All Caches and TLB are empty on Reset • For the sequence of memory access below, how many accesses are needed for the translations? sign ext. PML4 PDP DIR TABLE OFFSET 19 11 0 43 35 27 63 12

  24. Question 2 • Processor similar to X86 – 64 bits • Pages of 4KB • The processor has a TLB • TLB Hit: we get the translation with no need to access the translation tables • TLB Miss: the processor has to do a Page Walk • The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables • All Caches and TLB are empty on Reset • For the sequence of memory access below, how many accesses are needed for the translations? sign ext. PML4 PDP DIR TABLE OFFSET 19 11 0 43 35 27 63 12

  25. Question 2 • Processor similar to X86 – 64 bits • Pages of 4KB • The processor has a TLB • TLB Hit: we get the translation with no need to access the translation tables • TLB Miss: the processor has to do a Page Walk • The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables • All Caches and TLB are empty on Reset • For the sequence of memory access below, how many accesses are needed for the translations? sign ext. PML4 PDP DIR TABLE OFFSET 19 11 0 43 35 27 63 12

  26. Question 2 • Processor similar to X86 – 64 bits • Pages of 4KB • The processor has a TLB • TLB Hit: we get the translation with no need to access the translation tables • TLB Miss: the processor has to do a Page Walk • The hardware that does the Page Walk (PMH) contains a cache for each of the translation tables • All Caches and TLB are empty on Reset • For the sequence of memory access below, how many accesses are needed for the translations? sign ext. PML4 PDP DIR TABLE OFFSET 19 11 0 43 35 27 63 12

  27. Question 2 • L1 data Cache: 32KB – 2 ways of 64B each • How can we access this cache before we get the physical address? 64B  6 bits offset bits [5:0] 32KB = 215 / (2 ways * 26 bytes) = 28= 256 sets [13:6] • 12 bits are not translated: [11:0] • we lack 2 bits [13:12] to get the set address • So we do a lookup using 2 un-translated bits for the set address • Those bits can be different from the PFN obtained after translation, therefore we need to compare the whole PFN to the tag stored in the Cache Tag array

  28. Question 2: Read Acces Translated Not Translated VPN Set Offset 13 12 11 6 5 0 47 Tag [40:12] Tag [40:12] 40 13 12 PFN Way 0 Way 1 40:12 = = Tag Match Tag Match

  29. Question 2:example Translated Not Translated VPN Set Offset 13 12 11 6 5 0 47 Write Virtual Address A VPN Offset PFN Offset VA: 0000 … 0000 0000 0000 0000 PA: 0 … 0000 0000 0000 0000 Set: [13:6] = 0 Tag: 0 Read Virtual Address B VB: 0011 … 0000 0000 0000 0000 PB: 0… 0011 0000 0000 0000 Tag: 3 (0 if we don’t take [13:12]) Set: [13:6] = 0

  30. Question 2: Virtual Alias • L1 data Cache: 32KB • 2 ways of 64B each • What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? VPN xxxx01 Translated Physical Addresses Cache Not Translated Physical Addresses Cache VPN Set Offset PFN zzzz 13 12 11 6 5 0 47 Set[11:6] Not translated Max: 64 sets 01.set[11:6] zzzz VPN yyyy00 zzzz 40 13 12 00.set[11:6] PFN zzzz • 2 virtual pages map to the same frame • xxxx01.set.ofset and yyyy00.set.ofset • 01 and 00 are bits 13:12 Phys. addressed cache: The data exist only once Virtual addressed cache: The data may exist twice AVOID THIS !!!

  31. Question 2: Virtual Alias • L1 data Cache: 32KB • 2 ways of 64B each • What will happen when we access with a given offset the virtual page A and after this, there is an access with the same offset in the virtual page B, which is mapped by the OS to the same physical page as A? Translated Not Translated Physical Addresses Cache VPN • Avoid having the same data twice in the cache xxxx01.set.ofsetand yyyy00.set.ofset • Check 4 sets when we allocate a new entry and see if the same tag appears • If yes, evict the second occurrence of the data (the alias) Set Offset 13 12 11 6 5 0 47 01.set[11:6] zzzz 40 13 12 00.set[11:6] PFN zzzz Virtual addressed cache: The data may exist twice AVOID THIS !!!

  32. Question 2: Snoop • L1 data Cache: 32KB • 2 ways of 64B each • What happens in case of snoop in the cache? The cache is snooped with a physical address [40:0] Since the 2 MSB bits of the set address are virtual, a given physical address can map to 4 different sets in the cache (depending on the virtual page that is mapped to it) So we must snoop 4 sets * 2 ways in the cache Translated Not Translated Physical Addresses Cache VPN Set Offset 13 12 11 6 5 0 47 01.set[11:6] zzzz 40 13 12 00.set[11:6] PFN zzzz Virtual addressed cache: The data may exist twice AVOID THIS !!!

  33. Question 3 • Core similar to X86 in 64 bit mode • Supports small pages (pointed by PTE) and large pages (pointed by DIR). • Size of an entry in all the different page tables is 8 Bytes • PMH Caches at all the levels • 4 entries direct mapped • Access time onhit: 2 cycles • Miss known after 1 cycle • PMH caches are accessed at all the levels in parallel • In each level, when there is a HIT, the PMH cache provides the relevant entry in the page table in the relevant level • In each level, when there is a miss: the core accesses the relevant page table in the main memory. • Access time to the main memory is 100 cycles, not including the time needed to get the PMH cache miss. sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  34. Question 3 • What is the size of the large pages? sign ext. PML4 PDP DIR TABLE OFFSET The large page is pointed by DIR, therefore, all the bits under are offset inside the large page: 224 = 16 MB 23 11 0 55 47 35 63 12 • How many entries in each Page Table ? • PTE: 212 • DIR: 212 • PDP: 212 • PML4: 28

  35. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  36. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  37. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  38. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  39. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  40. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  41. Backup Slides

  42. Translated not translated 0 47 12 11 Virtual Address Set Offset 13 6 5 0 VPN 47 13 … 11 40 : Tag field 40 11 PFN Way 0 Way 3 = = Tag Match Tag Match

  43. Translated not translated Virtual Address Set Offset VPN … : Tag field 40 PFN Way 0 Way 3 = = Tag Match Tag Match

  44. Question 3 TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle • We have a sequence of accesses to virtual addresses. For each address, how many cycles are needed to translate the address.Assume that we use small pages only and that the TLBs are empty upon reset sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  45. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  46. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  47. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  48. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

  49. TLB 4entries Direct mapped TLB hit: 2 cycles TLB miss: 1 cycle Memory access: 100 cycle sign ext. PML4 PDP DIR TABLE OFFSET 23 11 0 55 47 35 63 12

More Related