370 likes | 566 Views
Geiger: Monitoring the Buffer Cache in a Virtual Machine Environment. Stephen T. Jones Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Department of Computer Sciences. Buffer Cache. In modern OSes, file system buffer and virtual memory system are unified
E N D
Geiger: Monitoring the Buffer Cache in a Virtual Machine Environment Stephen T. Jones Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Department of Computer Sciences
Buffer Cache • In modern OSes, file system buffer and virtual memory system are unified • When first access a file, data is buffered in a memory page • When under memory pressure, a page will be evicted out • If the page is dirty, write to swap space or file system first • Then the page can be reused • Later, if the data is needed, a page fault occurs • Allocate a free page, reload the data from disk to the page
Useful Information About Buffer Cache • If VMM knows events of eviction/promotion • Tell if guest OS is thrashing and how much more memory allocation is needed to prevent it • Guide eviction-based cache placement • exclusive cache: when hits, data item is removed • A transparent secondary cache maybe desirable • E.g. a 32-bit OS running on a host with 16 GB mem • Why exclusive cache works? • Normally, when a page is read from disk, the OS will not read it again without evicting it first • Increase cache utilization
Services in a VMM • VMM layer is attractive development target • Security (isolation from OS and apps) • Portability (transparent to OS) • Our target services • VMM-driven eviction-based cache placement • Increase hit-ratio for remote storage caches • Transparent to guest OS • Working set size estimation for thrashing VMs • Complement ESX server technique
VMM Services Need Information • Information about guest operating systems • For our target services • Information about OS buffer cache • Hidden from the VMM • Layered design approach • Narrow interface (virtual architecture)
Geiger Monitors Buffer Cache • Virtual machine monitor extension • Implicitly observes buffer cache events • Uses only information intrinsically available to VMM • Explicit approach possible, but drawbacks • No guest OS modifications required • Applicable to closed and legacy OS • Accurate (usually less than 5% error) • Low cost (usually less than 3% overhead) • Enables service implementation in VMM
Outline • Geiger approach • New Geiger techniques • Evaluation • Application
Buffer Cache Events • Cache promotion • Disk block inserted into buffer cache • Cache demotion • Disk block removed from cache
Detecting Promotion • Block read • Block write • Disk reads and writes visible to the VMM • Associated Disk Location (ADL) A A ADL User process A B C C C B Buffer cache Disk
Detecting Demotion • Detect when a page is removed from the cache • VMM cannot observe page free directly • Instead, look for page reuse • If cache page data is reused, the page was logically freed in the interim • Reuse inconsistent with ADL -> eviction ADL A B A B C C C Buffer cache Disk
Read / Write Evictions • Read eviction • A non-free page is reused for reading from a different disk location • E.g. read a large file/memory space • Write eviction • A non-free page is reused for writing. When it is written-back, the reuse (eviction) is detected • Lag
Existing Techniques • Promotion via reads and writes • Demotion via reads and writes • Chen et al. -- USENIX 2003 • Within OS (pseudo device driver) • Initial basis for Geiger
Outline • Geiger approach • New Geiger techniques • Evaluation • Application
New Geiger Techniques • Other ways buffer cache pages are evicted • Unified buffer cache/virtual memory system • Non-I/O allocations cause eviction • Two new eviction detection heuristics • Copy-on-write • Anonymous allocation
When Eviction Happens? • Explicit Eviction • Read eviction • Write eviction • Implicit Eviction • A non-free page is reused without disk writing or reading • Page allocation or Copy-on-Write • E.g. when a process requests for a new page, a non-dirty page is allocated it
Detecting Allocation Eviction • Page not-present fault • Page allocation (possible reuse) • New writable mapping • Detect eviction • Invalidate ADL R z User process A B C C C z A B A’ Disk Buffer cache
Filesystem Issues • Filesystem features cause false positives • Filesystem blocks can be deleted • Leads to dangling ADL and spurious eviction • Journaling causes aliasing • Same cache page written to both the journal and filesystem locations • Interferes with write-eviction heuristic
Geiger Is Filesystem Aware • Uses static filesystem info • Journal location and size • Block allocation bitmaps • Ignore writes to the journal • Track allocation bitmap updates and invalidate ADLs when blocks deallocated • Significantly reduces Geiger false positives
Block Liveness • Reusing a free page is not an eviction • Geiger infers the liveness of a page from the liveness of block • A block dies • A file is deleted or truncated • A process with virtual memory usage terminates
Block Liveness for Files • Observing the writes to superblock • :They are at some special disk location • : OS caches them in memory and sync to disk every 30 secs or more • Pages used to cache them are marked read-only • Write attempts will cause page-faults • Invalidate affected ADLs
Block Liveness for Swap Space • No on-disk structure to track block usage • When a disk block is written from a different memory page, the original block is considered to be “dead” • Maintain a reverse mapping from between blocks and ADLs • Invalidate ADLs when blocks are overwritten • If no overwritting, dead blocks can’t be detected • Leads to as much as 37% false positive eviction
Outline • Geiger approach • New Geiger techniques • Evaluation • Application
Evaluation Goals • Measure Geiger accuracy • Missed evictions (false negatives) • Spurious evictions (false positives) • Measure Geiger timeliness • Lag between actual event and detection
Experimental Environment • Xen 2.0.7 VMM [Barham et al., SOSP03] • Extensions to observe page faults, page table updates, and I/O requests/completions • Linux 2.4 and 2.6 guests • Microbenchmarks • Isolate specific eviction types • Read, write, COW, allocation • Application benchmarks • Dbench, Mogrify, TPC-W, SPC disk trace
Eviction Detection Accuracy Workload False Neg % False Pos % Read Evict 0.96% 0.58% Write Evict 1.68% 0.03% COW Evict 2.47% 1.45% Alloc Evict 0.17% 0.17%
Application Accuracy Workload Geiger Opt False Neg% False Pos% Dbench w/o block liveness 1.10% 30.23% Dbench w/ block liveness 2.30% 5.72% Mogrify w/o block liveness 0.05% 22.99% Mogrify w/ block liveness 0.65% 2.46% TPC-W 0.14% 3.12% SPC Web2 2.24% 0.32%
Outline • Geiger approach • New Geiger techniques • Evaluation • Application • Eviction-based cache placement
Application:Eviction-based Cache Placement • Disk cache utilization is critical to performance • Storage servers have large caches • Demand-based placement => poor utilization • Increase cache utilization via exclusivity • Use client cache eviction as placement hint [Chen et al., USENIX ’03, Wong and Wilkes, USENIX ‘02] • Use VMM-based, implicit eviction information to inform a remote storage cache • No client or OS storage interfaces change
Cache Placement Results 13% 51% • Geiger outperforms demand placement • Mogrify: buffer misses too many evictions • Mogrify: false positives are fortuitous • Dbench: Lag causes OS to outperform Geiger
Outline • Geiger approach • New Geiger techniques • Evaluation • Application • Eviction-based cache placement • Working set size estimator
LRU Miss Ratio Curve m m m m m m m m m m m m m m m m c h i j k l n k h n c b c d e g l f f d l n i a a a b e c k i g b h g f d e k e n l n a b c d g e f g h i j k l a b d i f j h i j c k l n a b c d e f g h k c k l c d e f g h n b i l n n a 1 1 0 j a b d f g h i l j k l k n c 4 n 3 2 e g c l k n 0 n 0 0 k 0 0 0 0 n n l 0 l j j 0 0 0 0 c 0 l j n a b c k e f i k h n l d LRU Queue Pages in LRU order 1 14 Hit Histogram Associated with each LRU position 5 Fault Curve faults 1 1 4 11 14 pages
Application:Working Set Size Estimator • MemRx: • Observe evictions/reloads • Compute miss ratio curve WSS = current memory allocation + LRU estimation Only works when WSS > current memory size
Estimation Results:Microbenchmarks Virtual Machine is configured with 128 MB memory Each benchmark accesses 256 MB file/memory FS: file access VM: memory access
Summary • System services in a VMM • Need information about the guest OS • Implicit information about the buffer cache • No guest OS modification • Accurate • Low overhead • Build services and optimizations in a VMM • Eviction-based cache placement • Working set size estimation
Computer Sciences Department Advanced Systems Laboratory http://cs.wisc.edu/adsl