1 / 31

Computer Architecture Virtual Memory (VM)

Computer Architecture Virtual Memory (VM). By Dan Tsafrir, 23/5/2011 Presentation based on slides by Lihu Rappoport. http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning). DRAM (dynamic random-access memory). Corsair 1333 MHz DDR3 Laptop Memory Price (at amazon.com): $43 for 4 GB

zaza
Download Presentation

Computer Architecture Virtual Memory (VM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer ArchitectureVirtual Memory (VM) By Dan Tsafrir, 23/5/2011Presentation based on slides by Lihu Rappoport

  2. http://www.youtube.com/watch?v=3ye2OXj32DM(funny beginning)

  3. DRAM (dynamic random-access memory) • Corsair 1333 MHz DDR3Laptop Memory • Price (at amazon.com): • $43 for 4 GB • $79 for 8 GB • “The physical memory”

  4. VM – motivation • Provides isolation between processes • Processes can concurrently run on a single machine • Vm prevents them from accessing the memory of one another • (But still allows for convenient sharing when required) • Provides illusion of large memory • VM size can be bigger than physical memory size • VM decouples program from real size (can differ across machines) • Provides illusion of contiguous memory • Programmers need not worry about where data is placed exactly • Allows for memory dynamic growth • Can add memory to processes at runtime as needed • Allows for memory overcommitment • Sum of VM spaces (across all processes) can be >= physical • DRAM often one of the most costly parts in the system

  5. VM – terminology • Virtual address space • Space used by the programmer • “Ideal” = contagious & as big is you’d like • Physical address • The real, underlying physical memory address • Completely abstracted away by OS/HW

  6. VM – basic idea • Divide memory (virtual & physical) into fixed size blocks • “page” = chunk of contagious data in virtual space • “frame” = physical memory exactly enough to hold one page • |page| = |frame| (= size) • page size = power of 2 = 2k (bytes) • By default, k=12 almost always => page size is 4KB • While virtual address space is contiguous • Pages can be mapped into arbitrary frames • Pages can reside • In memory or on disk (hence, overcommitment) • All programs are written using vm address space • HW does on-the-fly translation from virtual and physical addresses • Use a page table to translate between virtual and physical addresses

  7. VM – simplistic illustration • Memory acts as a cache for the secondary storage (disk) • Immediate advantages • Illusion of contiguity & of having more physical memory • Program actual location unimportant • Dynamic growth, isolation, & sharing are easy to obtain address translation frames(DRAM) pages(virtual space) disk

  8. Translation – use a “page table” virtual address (64bit) 63 0 12 11 page offset (12bit) virtual page number (52bit) how to map? page offset (12bit) physical frame number (20bit) physical address (32bit) (page size is typically 212 byte= 4KB)

  9. Translation – use a “page table” frameNumber V D AC page table base register access control dirty bit 1 0 valid bit (page size is typically 212 byte= 4KB)

  10. virtual address (64bit) 63 0 12 11 page offset (12bit) virtual page number (52bit) frameNumber V D AC page table base register access control dirty bit 1 0 valid bit 31 0 11 12 page offset (12bit) physical frame number (20bit) physical address (32bit) Translation – use a “page table” (page size is typically 212 byte= 4KB)

  11. Translation – use a “page table” frameNumber V D AC “PTE” (page table entry)

  12. Page tables Page Table points to memory frame or disk address Virtual page number Physical Memory Valid 1 1 1 1 0 1 1 0 Disk 1 1 0 1

  13. Checks • If ( valid == 1 ) page is in main memory at frame address stored in table  Data is readily available (e.g., can copy it to the cache) else /*page fault */ need to fetch page from disk  causes a trap, usually accompanied by a context switch: current process suspended while page is fetched from disk • Access Control • R=read-only, R/W=read/write, X=execute • If ( access type incompatible with specified access rights )  protection violation fault  traps to fault-handler • Demand paging • Pages fetched from secondary memory only upon the first fault • Rather then, e.g., upon file open

  14. Page replacement • Page replacement policy • Decided which page to place on disk • LRU (least recently used) • Typically too wasteful (updated upon each memory reference) • FIFO (first in first out) • Simplest: no need to update upon references, but ignores usage • Second-chance • Set per-page “was it referenced?” bit (can be done by HW or SW) • Swap out first page with bit = 0, FIFO order • When traversed, if bit = 1, set it to be 0 and push the associated page to end of the list (in FIFO terms, page becomes newest) • Clock • More efficient variant of second-chance • Pages are cyclically ordered (no FIFO); search clockwise for first page with bit=0; set bit=0 for pages that have bit=1

  15. Page replacement – cont. • NRU (not recently used) • More sophisticated LRU approximation • HW or SW maintains per-page ‘referenced’ & ‘modified’ bits • Periodically (clock interrupt), SW turns ‘referenced’ off • Replacement algorithm partitions pages to • Class 0: not referenced, not modified • Class 1: not referenced, modified • Class 2: referenced, not modified • Class 3: referenced, modified • Choose at random a page from the lowest class for removal • Underlying principles (order is important): • Prefer keeping referenced over unreferenced • Prefer keeping modified over unmodified • Can a page be modified but not referenced?

  16. Page replacement – advanced • ARC (adaptive replacement cache) • Factors not only recency (when latest access),but also frequency (how many times accessed) • User determines which factor has more weight • Better (but more wasteful) than LRU • Develop by IBM: Nimrod Megiddo & DharmendraModha • Details: http://www.usenix.org/events/fast03/tech/full_papers/megiddo/megiddo.pdf • CAR (clock with adaptive replacement) • Similar to ARC, and comparable in performance • But, unlike ARC, doesn’t require user-specified parameters • Likewise developed by IBM: SoravBansal & DharmendraModha • Details: http://www.usenix.org/events/fast04/tech/full_papers/bansal/bansal.pdf

  17. Page faults • Page faults: the data is not in memory  retrieve it from disk • CPU detects the situation (valid=0) • But it cannot remedy the situation (doesn’t know disk; it’s the OS job) • Thus, it must trap to OS • OS loads page from disk • Possibly writing victim page to disk (if no room & if dirty) • Possibly avoids reading from disk due to OS “buffer cache” • OS updates page table (valid=1) • OS resumes process; now, HW will retry & succeed! • Page fault incurs a significant penalty • “Major” page fault = must go get page from disk • “Minor” page fault = page already resides in OS buffer cache • Possible only for files; not for “anonymous” spaces like the stack • => pages shouldn’t be too small (as noted, typically 4KB)

  18. Page size • Smaller page size (typically 4KB) • PROS: minimizes internal fragmentation • CONS: increase size of page table • Bigger size (called “superpages” if > 4K) • PROS: • Amortize disk access cost • May prefetch useful data • May discard useless data early • CONS: • Increased fragmentation • Might transfer unnecessary info at the expense of useful info • Lots of work to increase page size beyond 4K • HW supports it for years; OS is the “bottleneck” • Attractive because: • Bigger DRAMs, increasing memory/disk performance gap

  19. Virtual Address TLB Access TLB Hit ? No Access Page Table Physical Addresses Yes TLB (translation lookaside buffer) • Page table resides in memory • Each translation requires a memory access • Might be required for each load/store! • TLB • Cache recently used PTEs • speed up translation • typically 128 to 256 entries • usually 4 to 8 way associative • TLB access time is comparable to L1 cache access time

  20. TLB Valid Tag Physical Page Virtual page number 1 Physical Memory 1 1 1 0 1 Page Table Valid 1 1 Disk 1 1 0 1 1 0 Physical Page Or Disk Address 1 1 0 1 Making Address Translation Fast TLB is a cache for recent address translations:

  21. Tag Set Way 2 Way 2 Way 3 Way 3 Way 0 Way 0 Way 1 Way 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TLB Access Virtual page number Offset Set# = = = = Way MUX PTE Hit/Miss

  22. Unified L2 • L2 is unified (no separation for data/inst) – as the main memory • In case of a miss in either: d-L1, i-L1, d-TLB, or i-TLB=> try to get missed data from L2 • PTEs can and do reside in L2 L1 Data Cache L1 Instruction cache Memory L2 cache translations translations Data TLB Instruction TLB

  23. Access Cache Physical Addresses VM & cache Virtual Address Access TLB L2Cache Hit ? L2Cache Hit ? L1Cache Hit ? TLB Hit ? Access Page Table In Memory No No No Access Memory No Yes Yes Data • TLB access is serial with cache access => performance is crucial! • Page table entries can be cached in L2 cache (as data)

  24. 29 12 11 0 Physical Page Number Page offset 29 14 13 6 5 0 tag set disp Overlapped TLB & cache access VM view of a Physical Address Cache view of a Physical Address • #Set is not contained within the Page Offset • The #Set is not known until the physical page number is known • Cache can be accessed only after address translation done

  25. tag set disp Overlapped TLB & cache access (cont) Virtual Memory view of a Physical Address 29 12 11 0 Physical Page Number Page offset Cache view of a Physical Address 29 6 5 0 12 11 • In the above example #Set is contained within the Page Offset • The #Set is known immediately • Cache can be accessed in parallel with address translation • Once translation is done, match upper bits with tags Limitation: Cache ≤ (page size × associativity)

  26. Tag Set Overlapped TLB & cache access (cont) Virtual page number Page offset set disp TLB Cache Set# Set# = = = = Way MUX Physical page number Hit/Miss = = = = = = = = Way MUX Hit/Miss Data

  27. Overlapped TLB & cache access (cont) • Assume cache is 32K Byte, 2 way set-associative, 64 byte/line • (215/ 2 ways) / (26 bytes/line) = 215-1-6 = 28 = 256 sets • In order to still allow overlap between set access and TLB access • Take the upper two bits of the set number from bits [1:0] of the VPN • Physical_addr[13:12] may be different than virtual_addr[13:12] • Tag is comprised of bits [31:12] of the physical address • The tag may mis-match bits [13:12] of the physical address • Cache miss  allocate missing line according to its virtual set address and physical tag 29 12 11 0 Physical Page Number Page offset 29 14 13 12 11 6 5 0 set disp tag VPN[1:0]

  28. Swap & DMA (direct memory access) • DMA copies page to disk controller • Access memory without requiring CPU involvement • Reads each line: • Executes snoop-invalidate for each line in the cache (both L1 and L2) • If the line resides in the cache: • if it is modified reads its line from the cache into memory • invalidates the line • Writes the line to the disk controller • This means that when a page is swapped-out of memory • All data in the caches which belongs to that page is invalidated • The page in the disk is up-to-date • The TLB is snooped • If the TLB hits for the swapped-out page, TLB entry is invalidated • In the page table • Assign 0 to valid bit in PTE of swapped-out pages • The rest of the PTE bits may be used by the OS for keeping the location of the page on disk

  29. Context switch • Each process has its own address space • Akin to saying “each process has its own page table” • OS allocates frames for process => updates its page table • If only one PTE points to frame throughput the system • Only the associated process can access the corresponding frame • Shared memory • Two PTEs of two processes point to the same frame • Upon context switching • Save current architectural state to memory • Architectural registers • Register that holds the page table base address in memory • Flush TLB • Same virtual addresses are routinely resused • Load the new architectural state from memory • Architectural registers • Register that holds the page table base address in memory

  30. Trans- lation VA PA CPU Main Memory Cache hit data Virtually-addressed cache • Cache uses virtual addresses (tags are virtual) • Only require address translation on cache miss • TLB not in path to cache hit! But… • Aliasing: 2 virtual addresses mapped to same physical address • => 2 cache lines holding data of same physical address  • => Must update all cache entries with same physical address 

  31. Virtually-addressed cache • Cache must be flushed at task switch • Possible solution: include unique process ID (PID) in tag • How to share & synchronize memory among processes • As noted, must permit multiple virtual pages to refer to same physical frame • Problem: incoherence if they point to different physical pages • Solution: require sufficiently many common virtual LSB • With direct mapped cache, guarantied that they all point to same physical page

More Related