1 / 166

Architecture And Software Support For Persistent and Vast Memory

Architecture And Software Support For Persistent and Vast Memory. Swapnil Haria PhD Final Defense. Persistent and Vast Memory is Here!. Intel Optane Memory. So are its challenges …. Executive Summary. Minimally Ordered Durable Datastructures (MOD)

dbock
Download Presentation

Architecture And Software Support For Persistent and Vast Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture And Software Support ForPersistent and Vast Memory Swapnil Haria PhD Final Defense 6/10/19

  2. Persistent and Vast Memory is Here! Intel Optane Memory So are its challenges …

  3. Executive Summary Minimally Ordered Durable Datastructures (MOD) • How to build fast, recoverable applications without changing hardware? • Reduce the number of expensive ordering points in software • Perform 60% faster than state-of-the-art STM library Hands-Off Persistence System (HOPS) • How to improve recoverable application performance with hardware changes? • Separate hardware primitives for frequent ordering points and rarer durability events • Performs 24% faster than state-of-the-art STM library Devirtualized Memory (DVM) • How to lower virtual memory overheads in systems with vast memory? • Eliminate expensive address translation on most accesses • Reduces overheads to less than 4% of execution time

  4. Outline Background (8 slides) Minimally Ordered Durable Datastructures for Persistent Memory (New, 20 slides) Hands-Off Persistence System (Prelim, 2 or 10 or 20 slides) Devirtualized Memory for Heterogeneous Systems (Prelim, 2 or 8 or 20 slides)

  5. The Age of Persistent Memory has begun!

  6. Many other PM technologies may follow Phase Change Memory Resistive RAM Magnetic RAM Spin Transfer Torque RAM

  7. Many other PM technologies may follow High Capacity Data Persistence Near-DRAM latencies Asymmetric Read/Write Latency

  8. Intel Optane DC Persistent Memory

  9. Promise of Persistent Memory Bigger DRAM Durable In-memory Datastructures Care about Memory NOT OUR FOCUS Faster Filesystems Care about Storage

  10. Programming Challenges CPU 0 CPU 1 Private L1 Private L1 Durability Consistency (Failure) Atomicity Shared Last-Level Cache PM Controller DRAM Controller Persistence Domain Volatile Persistent

  11. Background: Software Transactional Memory CACHE array BEGIN-TX value1 = array[index1] value2 = array[index2] LOG (index1, value1); FLUSH LOG; FENCE array[index1] = value2 FLUSH (&array[index1]) LOG (index2, value2); FLUSH LOG; FENCE array[index2] = value1 FLUSH (&array[index2]) FENCE END-TX Y X Y Y Index2=Y Index2=Y Index1=X Index1=X LOG PM array X Y LOG

  12. Background: Software Transactional Memory CACHE array X Y X Y Index2=Y Index1=X LOG Use LOG to clean up the mess System Crash PM array X Y Y Index1=X Index2=Y LOG

  13. Outline Background (8 slides) Minimally Ordered Durable Datastructures for Persistent Memory (New, 20 slides) - Analysis of Flushing Overheads on Optane - Building Blocks for Functional Shadowing - MOD Datastructures - Evaluation Hands-Off Persistence System (Prelim, 2 or 10 or 20 slides) Devirtualized Memory for Heterogeneous Systems (Prelim, 2 or 8 or 20 slides)

  14. STM performance on Optane Execution Time Self-Normalized

  15. Measuring Flush Overheads on Optane FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE FLUSH FENCE …

  16. Measuring Flush Overheads on Optane

  17. MOD Datastructures: Goals Reduce ordering points with NO hardware modifications Library of recoverable datastructures: vector, map, queue, etc.

  18. Eliminating Ordering Points How to provide atomicity and consistency with minimal ordering? Databases, Filesystems: Shadow Paging Key Idea: Avoid overwriting data!

  19. Ordering in STM BEGIN-TX value1 = array[index1] value2 = array[index2] LOG (index1, value1); FLUSH LOG; FENCE array[index1] = value2 FLUSH (&array[index1]) LOG (index2, Y); FLUSH LOG; FENCE array[index2] = value1 FLUSH (&array[index2]) FENCE END-TX

  20. Background: Shadow Paging CACHE array value1 = array[index1] value2 = array[index2] shadow = array // Create shadow copy shadow[index1] = value2 shadow[index2] = value1 FLUSH (shadow) FENCE // Application uses shadow subsequently X Y shadow X Y X Y X Y PM array X Y shadow

  21. Déjà vu? Functional Datastructures! Purely Functional datastructures are immutable Implemented as efficient trees: Hash Array Mapped trie, RRBTree Copying overheads reduced by structural sharing array[8] updatedArray[8] swap Y X X Y

  22. Shadow paging to minimize ordering constraints FunctionalShadowing Functional Datastructures to reduce shadow paging overheads

  23. MOD usecases • One Update to One Datastructure (Atomic, Consistent, Durable) • Multiple Updates to One Datastructure (Atomic, Consistent, Durable) • One Update per Multiple Datastructures (Atomic, Consistent, Durable)

  24. 1: One Update to One Datastructure All Flushes Overlapped dsPtrShadow = dsPtr->Update(updateParams) FENCE dsPtr = dsPtrShadow • Update(arrayPtr, index, value) • // Atomic, Durable, Consistent with 1 FENCE arrayPtr array[8] shadowArray[8] update

  25. 2: Multiple Updates to One Datastructure dsPtrShadow1 = dsPtr->Update1(updateParams) dsPtrShadow2 = dsPtrShadow1->Update2(updateParams) FENCE dsPtr = dsPtrShadow2 Commit (dsPtr, dsPtrShadow1, dsPtrShadow2) All Flushes Overlapped dsPtr dsPtrShadow2 dsPtrShadow1 Update2 Update1 ds dsShadow1 dsShadow2

  26. 3: One Update per Multiple Datastructures ds1PtrShadow = ds1Ptr->Update1(updateParams1) ds2PtrShadow = ds2Ptr->Update2(updateParams2) FENCE Begin-TX { ds1Ptr = ds1PtrShadow ds2Ptr = ds2PtrShadow } End-TX Commit (ds1Ptr, ds1PtrShadow, ds2Ptr, ds2PtrShadow) All Flushes Overlapped More ordering points but short transaction

  27. Evaluation Methodology Used C++ library of functional datastructures: https://github.com/arximboldi/immer Used off-the-shelf persistent memory allocator: https://github.com/hyrise/nvm_malloc.git Compared against Intel PMDK v1.5 (latest, Oct 2018) https://github.com/pmem/pmdk

  28. Performance Comparison on Optane Execution Time Normalized to PMDK

  29. Performance Comparison on Optane Execution Time Normalized to PMDK

  30. MOD: Summary Minimizes number of ordering points in software Uses Shadow Paging instead of STM Requires changes in the application code

  31. Outline Background (8 slides) Minimally Ordered Durable Datastructures for Persistent Memory (New, 20 slides) Hands-Off Persistence System (Prelim, 2 or 10 or 20 slides) Devirtualized Memory for Heterogeneous Systems (Prelim, 2 or 8 or 20 slides)

  32. MOD: Summary HOPS: Goals Minimizes number of ordering points in software Minimizes cost of ordering points in hardware Uses Shadow Paging instead of STM Improves STM performance Requires changes in the application code Requires minimal changes in library code

  33. DFENCE: Durability Fence Makes the stores preceding DFENCE durable Execution DFENCE ST A=1 ST B=2 ST C=1 Time Persistent Memory ST B=2 ST A=1

  34. OFENCE: Ordering Fence Orders stores preceding OFENCE before later stores OFENCE Execution ST A=1 ST B=2 Time Persistent Memory ST B=2 ST A=1

  35. Array Swap: HOPS // TX-Start value1 = arrayPtr->at(index1) value2 = arrayPtr->at(index2) LOG (index1, value1);ofenceFLUSH LOG; FENCE arrayPtr[index1] = value2 FLUSH (&arrayPtr[index1]) LOG (index2, value2);ofenceFLUSH LOG; FENCE arrayPtr[index2] = value1 FLUSH (&arrayPtr[index1]) dfence FENCE // TX-End

  36. Base System CPU CPU Private L1 Private L1 Shared LLC + Loads Stores Loads + Stores PM Controller DRAM Controller Volatile Persistent

  37. Base System + Persist Buffers CPU CPU Persist Buffer Persist Buffer Private L1 Private L1 Shared LLC + Loads Stores Loads + Stores PM Controller DRAM Controller Volatile Persistent 38

  38. NVM Analysis HOPS Design 4% accesses to PM, 96% to DRAM 5-50 ordering points /transaction Order writes without flushing Volatile memory hierarchy (almost) unchanged Self-dependencies common Allows copies of same cacheline Correct, conservative method based on coherence Cross-dependencies rare

  39. Performance Evaluation on gem5 simulator HOPS Ideal performance, unsafe on crash Baseline, uses Intel instructions (Lower is Better)

  40. HOPS v. MOD

  41. HOPS + MOD MOD reduces ofencesto allow greater flush overlap at dfence HOPS makes it easier to port functional datastructures to MOD

  42. Outline Background (8 slides) Minimally Ordered Durable Datastructures for Persistent Memory (New, 20 slides) Hands-Off Persistence System (Prelim, 2 or 10 or 20 slides) Devirtualized Memory for Heterogeneous Systems (Prelim, 2 or 8 or 20 slides)

  43. Virtual Memory hardware on CPUs CPUs have expensive hardware to mitigate VM overheads FA 8 entries 2M page 8-way SA 64 entries 4K page 4-way SA 64 entries 4K page 4-way SA 32 entries 2M page 4-way SA 4 entries 1G page L1 Data TLB L1 Instruction TLB 12-way SA 1536 entries 4K/2M pages 4-way SA 16 entries 1G page L2 TLB

  44. Virtual Memory hardware on Accelerators? CPUs have expensive hardware to mitigate VM costs Cannot replicate in power/area-constrained accelerators Accelerator Compute Logic 4-way SA 4 entries 1G page 4-way SA 64 entries 4K page 4-way SA 32 entries 2M page 12-way SA 1536 entries 4K/2M pages

  45. 3Ps for Memory Management How do we get the best of both worlds?

  46. Devirtualized Memory (DVM) (Usually) Allocate data such that its VA == PA! Exploit PA == VA to mostly skip translation

  47. Memory Accesses with DVM BAD Permission Violation GOOD PA == VA UGLY PA != VA PA == VA? Translate VA Validate: Validate:× Validate:×? Load Exception Perms OK? Load

  48. Performance Evaluation on gem5 simulator 4% overheads Lower is Better Unsafe Physical Addresses

  49. Summary

  50. Persistent and Vast Memory is Here! Intel Optane Memory We have some timely answers!

More Related