1 / 22

Virtual Memory Primitives for User Programs

Virtual Memory Primitives for User Programs. Presentation by David Florey. Overview. This paper provides basic primitives, how there used and the implementation details on various OSs Discuss the various primitives and how they are used (in user level algorithms)

beryl
Download Presentation

Virtual Memory Primitives for User Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Memory Primitives for User Programs Presentation by David Florey CS533 - Concepts of Operating Systems

  2. Overview • This paper provides basic primitives, how there used and the implementation details on various OSs • Discuss the various primitives and how they are used (in user level algorithms) • Discuss the performance on various OSs • Discuss the ramifications of these uses (algorithms) on system design CS533 - Concepts of Operating Systems

  3. The Primitives (VM Services) • TRAP • Facility allowing user level handling of page faults (protection or otherwise) • An event that is raised (in the form of a message or signal from OS) • PROT1 • Decreases accessibility of a single page • A procedure call (via messaging, trap to OS, etc) • PROTN • Decreases accessibility of n pages • A procedure call (via messaging, trap to OS, etc) • UNPROT • Increases the accessibility of a single page • A procedure call (via messaging, trap to OS, etc) • DIRTY • Returns a set of pages that have been touched since the last call to dirty • A procedure call (via messaging, trap to OS, etc) • MAP2 • Map two different virtual addresses to point to the same physical page • Each virtual address has its own protection level • This is in the same address space (not two different processes or tasks or address spaces) • A procedure call (via messaging, trap to OS, etc) CS533 - Concepts of Operating Systems

  4. VM Service UsageConcurrent Garbage Collection • Stop all threads • Divide memory into from-space and to-space • Copy all objects reachable from “roots” and registers into to-space • Use PROTN to protect all pages in unscanned area • Use MAP2 to allow collector access to all pages while preventing mutators from accessing the same pages • Restart threads • As mutator threads attempt to access pages in to-space that are unscanned, TRAP event: • Stops mutator in its tracks • Calls collector, collector scans, forwards and UNPROTs page • Mutator allowed to continue • At some point this process is restarted and all objects left in from-space are considered garbage and removed CS533 - Concepts of Operating Systems

  5. Concurrent Garbage Collection

  6. VM Service Usage Shared Virtual Memory • Each CPU (or machine) has its own memory and memory mapping manager • Memory mapping managers keep CPU memory consistent with the “shared” memory • When a page is shared, it is marked “read-only” (PROT1) • Upon writing this page, a fault occurs in the writing thread causing TRAP event associated Mapping Manager • Mapping Manager uses trap to notify other MMs, which in turn flush their copy of the page (this mechanism may also be used to get an up-to-date copy of the page) • Page is then marked writable (UNPROT) and written • MAP2 is used to allow the trap-handler to access the protected page while the client cannot • TRAP is also used by MM to pull down a page from another CPU or disk when not available locally CS533 - Concepts of Operating Systems

  7. Shared Memory

  8. VM Service Usage Concurrent Checkpointing • Checkpointing is the process of state such as heap, stack, etc – which can be slow • Instead of a synchronous save, we can simply use PROTN to mark the pages that need to be saved to disk read-only • A second thread can then run concurrently with the user threads writing out pages and UNPROTing each page as its written • If a user thread hits a “read-only” page, a fault occurs TRAPping to the concurrent thread which quickly writes the page and allows the faulting thread to continue • Could also just do this with the DIRTY pages using PROT1 CS533 - Concepts of Operating Systems

  9. Concurrent Checkpointing CS533 - Concepts of Operating Systems

  10. Concurrent Checkpointing With DIRTY

  11. VM Service Usage Generational Garbage collection • Objects are kept in generations • The longer an object lives, the older its generation • Typically garbage is in younger generations, but an old object might be pointing at a young object so… • Use DIRTY checkpointing to see if pages containing old objects were changed, objects in these DIRTY pages can be scanned to see where they point • Or • PROTN all old pages and TRAP to a handler when old page is written to, save page id in a list for later scanning and UNPROT page so writer can write • Later, collector can scan the list of pages to see if any objects within the pages are pointing to younger generations • Why use a small page size here? CS533 - Concepts of Operating Systems

  12. VM Service UsageOthers… • Persistent Stores • Can use VM services to protect pages, trap on writes and persist dirty pages on commit or toss them on abort • TRAP, UNPROT and PROTN, UNPROT, MAP2 • Extending addressability • After translating 64-bit32-bit pages may need to be protected so that a TRAP handler can properly “load” the page for suitable access, then UNPROT it • TRAP, UNPROT, PROT1 or PROTN and MAP2 • Data-compression Paging • Compressing n pages into a couple of pages may be faster than writing these pages to disk. The compressed pages can then be access-protected. When user then tries to access such a page, TRAP, decompress, UNPROT • Could also use PROT1 to test access frequency of page • TRAP, PROT1 or PROTN, TRAP, UNPROT • Heap overflow detection • Terminate memory allocation with a “guard” (PROT1) page • Upon access to this page call TRAP-handler which triggers collector • Alternative is conditional branch • PROT1, TRAP CS533 - Concepts of Operating Systems

  13. Persistent Store Example& Data Compression Example

  14. Performance in OSs • Devised Appel1 and Appel2 based on algorithms’ patterns of primitive usage • Appel1 • PROT1, TRAP, UNPROT • e.g. Shared Virtual Memory • Appel2 • PROTN, TRAP, UNPROT • e.g. Concurrent garbage collection, CS533 - Concepts of Operating Systems

  15. Performance in OSs CS533 - Concepts of Operating Systems

  16. Performance of Primitives • All data normalized based on speed of Add instruction on CPU • Some OSs didn’t implement Map2 • Some OSs did a crummy job of implementing these primitives • mprotect does not flush the TLB correctly • OS designers seem to be relying on old notions like disk latency • Not relevant with CPU-based algorithms like these • One OS performed exceptionally well showing that these instructions don’t have to perform poorly CS533 - Concepts of Operating Systems

  17. Ramifications on System Design • Fault handling must be fast because we are no longer at the mercy of the disk – we can do it all in the CPU • TLB Consistency • Making memory more accessible is good for TLB consistency • One less thing you need to worry about • Making memory less accessible in the multi-processor case forces TLB “shootdown” • Stop all processors and tell each to flush entry 123 in TLB • Better if done in batches • In fact, paging out could improve if done in batches too CS533 - Concepts of Operating Systems

  18. Ramifications on System Design • Optimal Page Size • Some operations depend on the size of the page • “HEY OS DESIGNERS LISTEN UP!” • Disk latency can no longer be counted on for crummy design • Computations linearly proportional to page size are now going to be noticed, so we might benefit by cutting down the page size • Those algorithms that do a lot of scanning – like the Generational Garbage collector – would benefit from a smaller page size • Also be aware that shrinking page sizes will cause more page faults and more calls to the fault trap handler, so its overhead must also be very small CS533 - Concepts of Operating Systems

  19. Ramifications on System Design • Access to Protected Pages • Mapping same page two different ways with two different protections in same address space is FAST • Although it does add some bookkeeping overhead • And cache consistency could be a problem • You could achieve the same results by copying memory around – only 65 copies and you’re there! • Or pounding your head on the desk – that works too • You could also use a heavyweight process and super heavy RPC to context switch heavily, relying on the shared page between processes support in OSs • Techniques employeed in LRPC and URPC can alleviate the context switch problem CS533 - Concepts of Operating Systems

  20. Ramifications on System Design • What about pipelined processors? • Out-of-order execution • Dependence on sequential execution • Only a problem in the heap overflow detection case • Register tweaking can be a problem • All other algorithms work just like a typical page fault handler – handle fault, pull page in, make page accessible CS533 - Concepts of Operating Systems

  21. Final Considerations • Making memory more accessible one page at a time, and less accessible in large batches is good for TLB consistency • The total performance effect of page size should be considered (fixed costs vs variable costs) • Locality of reference is exploited in these algorithms • Better locality improves fault handling overhead (as data is closer to CPU) • Pages should be accessible in different ways in a single address space CS533 - Concepts of Operating Systems

More Related