1 / 19

An Evaluation of Using Deduplication in Swappers

An Evaluation of Using Deduplication in Swappers. Weiyan Wang, Chen Zeng. Motivation. Deduplication detects duplicate pages in storage NetApp, Data Domain: billion $ business We explore another direction: use deduplication in swappers

dodds
Download Presentation

An Evaluation of Using Deduplication in Swappers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng

  2. Motivation • Deduplication detects duplicate pages in storage • NetApp, Data Domain: billion $ business • We explore another direction: use deduplication in swappers • Our experimental results indicate that using deduplication in swappers is beneficial

  3. What is a swapper? • A mechanism to expand usable address spaces • Swap out: swap a page in memory to swap area • Swap in: swap a page in swap area to memory • Swap area is on disk Used P1 Free P1 pte’

  4. Why deduplication is useful? • Writes to disk is slow • Disk accesses is much slower than memory! • When duplicate pages exist: • Do we really need to swap out all of them? • If a duplicate page appear in swap area, we can save one I/O. P1 P2 P3 P1

  5. Architecture Swap out A page Compute checksum Lookup in the dedup cache YES Skip pageout NO pageout Add to dedup cache

  6. Computing Checksum • SHA-1 checksum (160bit) • Collision probability of one in 280 • Only use the first 32bit (one in 216) • Related to the implementation of dedup cache • Only store checksum • We assume two pages are identical if their checksums are equal • Trade consistency for performance

  7. Dedup Cache • Dedup cache - radix tree • Checksum -> dedup_entry_t • A Trie with O(|key|) lookup and update overhead • Well written in the kernel • Key in radix tree is 32 bits • We only keep the first 32 bits of a checksum as key

  8. Entries in Dedup Cache • The index of a page in swap area • The number of duplicates pages given a checksum • A lock for consistency typedef struct { swp_entry_t base; atomic_t count; spinlock_t lock; }dedup_entry_t;

  9. Changes to Linux Kernel • Swap cache • swap_entry_t ->page • Avoid repeatedly swapping in • Happens when a page swapped out is shared by multiple processes • Example • Process A and B share the page P • P is swapped out, PTE in A and B are updated • A wants to access P • B wants to access P

  10. Will dedup cache grows infinitely? • Swap Counter for each swap_entry_t • # of reference in the memory • counter++ when • one more pte contains swap_entry_t • It’s in swap cache • It’s in dedup cache • counter-- when swap in a page • remove swap_entry_t from dedup cache and swap cache when counter = 2

  11. Reference Counters Swap area (4) (2) A Swap cache B dedup cache

  12. Changes to Swap Cache • Maintain the mapping between swap_entry and page • We change that mapping to swap_entry and a list of pages of same contents • Why we need a list?

  13. Possible Inconsistency • Swap out page P1 to swap_entry e1 • Swap out page P2, a duplicate of P1 • The mapping of e1->P2 can not be added to swap cache • Swap in P1: mapping is deleted • Swap in P2: Ooops! E1 -> P1 Swap Cache

  14. Our Solution • Swap out page P1 to swap_entry E1 • Swap out page P2, a duplicate of P1 • The mapping of e1->P2 is added to the list • Swap in P1: only P1 is deleted • Swap in P2: delete E1->P2 E1 -> P2 E1 -> P1,P2 E1 -> P1 Swap Cache

  15. Experimental Evaluation • We run our experiment on VMWare with Linux 2.6.26 • Our testing program: sequentially access an array • Each element is of size 4KB • We change the percentage of duplicate pages in that array

  16. All of the pages are duplicates • Duplication significantly reduces the access time

  17. No Duplicate Pages • However, duplication also incurs a significant overhead

  18. Overheads in Deduplication • Major overheads: • Calculating checksums: 35 us • When a page is swapped in or swapped out, we all calculate the checksums. • Maintain the reference counter • Explicitly require locks impose significant overhead: average of 65 us in our experiments

  19. Conclusion • Deduplication is a double-edged sword in swappers • When a lot of duplicate pages are presented, deduplication reduces the access time by orders of magnitude • When few duplicate pages are presented, the overhead is also non-negligible

More Related