460 likes | 575 Views
Exploiting Gray-Box Knowledge of Buffer Cache Management. Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin - Madison Department of Computer Sciences. Caching. Buffer cache impacts I/O performance
E N D
Exploiting Gray-Box Knowledge of Buffer Cache Management Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin - Madison Department of Computer Sciences
Caching • Buffer cache impacts I/O performance • Cache hits much faster than disk reads OS Without Cache Knowledge: 2 disk reads Buffer Cache With Cache Knowledge 1 disk read Data Blocks
Knowledge is Power • Applications can use knowledge of cache state to improve overall performance • Web Server • Database Management Systems • Often no interface for finding cache state • Abstractions hide information
Workload + Policy Contents • Cache contents determined by: • Workload • Replacement policy • Algorithmic Mirroring • Observe workload • Simulate cache using policy knowledge • Infer cache contents from simulation model
Gaining Knowledge • Application knows workload • Assume application dominates cache • Cache policy is usually hidden • Documentation can be old, vague or incorrect • Source code may not be available • How can we discover cache policy?
Policy Discovery • Fingerprinting: automatic discovery of algorithms or policies (e.g. replacement policy, scheduling algorithm) • Dust - Fingerprints buffer cache policies • Correctly identifies many different policies • Requires no kernel modification • Portable across platforms
This Talk • Dust • Detecting initial access order (e.g. FIFO) • Detecting recency of access (e.g. LRU) • Detecting frequency of access (e.g. LFU) • Distinguishing clock from other policies • Fingerprints of Real Systems • NetBSD 1.5, Linux 2.2.19, Linux 2.4.14 • Exploiting Gray-Box Knowledge • Cache-Aware Web Server • Conclusions & Future Work
Dust • Fingerprints the buffer cache • Determines cache size • Determines cache policy • Determines cache history usage • Manipulate cache in controlled way • open/read/seek/close
Replacement Policies • Cache policies often use • access order • recency • frequency • Need access pattern to identify attributes • Explore in simulation • Well controlled environment • Variety of policies • Known implementations
Dust • Move cache to known state • Sets initial access order • Sets access recency • Sets frequency • Cause part of test data to be evicted • Sample data to determine cache state • Read a block and time it Repeat for confidence
Setting Initial Access Order Test Region Eviction Region for ( 0 test_region_size/read_size) { read(read_size); }
FIFO Priority Newer Pages Older Pages FIFO gives latter part of file priority
Detecting FIFO Out of Cache In Cache • FIFO evicts the first half of test region
Setting Recency Test Region Eviction Region Right Pointer Left Pointer do_sequential_scan(); left = 0; right = test_region_size/2; for ( 0 test_region_size/read_size){ seek(left); read(read_size); seek(right); read(read_size); right+=read_size; left+= read_size; }
LRU Priority LRU gives priority to 2nd and 4th quarters of test region
Detecting LRU • LRU evicts 1st and 3rd quarters of test region
Setting Frequency Test Region Eviction Region 2 3 4 5 6 6 5 4 3 2 7 Right Pointer Left Pointer do_sequential_scan(); left = 0; right = test_region_size/2; left_count = 1; right_count = 5; for ( 0 test_region_size/read_size) for (0 left_count) seek(left); read(read_size); for (0 right_count) seek(right); read(read_size); right+=read_size; left+= read_size; right_count++; left_count--;
LFU Priority LFU gives priority to center of test region
Detecting LFU • LFU evicts outermost stripes • Two stripes partially evicted
The Clock Algorithm • Used in place of LRU • Ref. bit set on reference • Ref. bit cleared as hand passes • Hand replaces a page with a ref. bit that’s already clear • On eviction, hand searches for a clear ref. bit Page Frame Reference bit
Detecting Clock Replacement • Two pieces of initial state • Hand Position • Reference Bits • Hand position is irrelevant – circular queue • Dust must control for reference bits • Reference bits affect order of replacement
Detecting Clock Replacement • Uniform reference bits • Random reference bits
Clock - Reference Bits Matter • Two fingerprints for Clock • Ability to produce both will imply Clock • Need a way to selectively set reference bits • Dust manipulates reference bits • To set bits, reference page • To clear all bits, cause hand to sweep • Details in paper
Dust Summary • Determines cache size (needed to control eviction) • Differentiates policies based on • access order • recency • frequency • Identifies many common policies • FIFO, LRU, LFU, Clock, Segmented FIFO, Random • Identifies history-based policies • LRU-2, 2-Queue
This Talk • Dust • Detecting initial access order (e.g. FIFO) • Detecting recency of access (e.g. LRU) • Detecting frequency of access (e.g. LFU) • Distinguishing clock from other policies • Fingerprints of Real Systems • NetBSD 1.5, Linux 2.2.19, Linux 2.4.14 • Exploiting Gray-Box Knowledge • Cache-Aware Web Server • Conclusions & Future Work
Fingerprinting Real Systems • Issues: • Data is noisy • Policies usually more complex • Buffer Cache/VM Integration • Cache size might be changing • Platform: • Dual 550 MHz P-III Xeon, 1GB RAM, Ultra2 SCSI 10000RPM Disks
F I F O NetBSD 1.5 L R U L F U • Increased variance due to storage hierarchy
F I F O NetBSD 1.5 L R U L F U • Four distinct regions of eviction/retention
F I F O NetBSD 1.5 L R U L F U • Trying to clear reference bits makes no difference • Conclusion: LRU
F I F O Linux 2.2.19 L R U L F U • Very noisy but looks like LRU • Conclusion: LRU or Clock
F I F O Linux 2.2.19 L R U L F U • Clearing Reference bits changes fingerprint • Conclusion: Clock
F I F O Linux 2.4.14 L R U L F U • Low recency areas are evicted • Low frequency areas also evicted • Conclusion: LRU with page aging
This Talk • Dust • Detecting initial access order (e.g. FIFO) • Detecting recency of access (e.g. LRU) • Detecting frequency of access (e.g. LFU) • Distinguishing clock from other policies • Fingerprints of Real Systems • NetBSD 1.5, Linux 2.2.19, Linux 2.4.14 • Exploiting Gray-Box Knowledge • Cache-Aware Web Server • Conclusions & Future Work
Algorithmic Mirroring • Model Cache Contents • Observe inputs to cache (reads) • Use knowledge of cache policy to simulate cache • Use model to make application-level decisions
NeST • NeST - Network Storage Technology • Software based storage appliance • Supports HTTP, NFS, FTP, GridFTP, Chirp • Allows configurable number of requests to be serviced concurrently • Scheduling Policy: FIFO
Cache-Aware NeST • Takes policy & size discovered by Dust • Maintains algorithmic mirror of buffer cache • Updates mirror on each request • No double buffering • May not be a perfect mirror • Scheduling Policy: In-Cache-First • Reduce latency by approximating SJF • Improve throughput by reducing disk reads
Performance 144 clients randomly requesting 200, 1MB files Server: P-III Xeon, 128MB Clients: 4 X P-III Xeon, 1GB Gigabit Ethernet Linux 2.2.19 • Improvement in response time • Robust to inaccuracies in cache estimate
Summary • Fingerprinting • Discovers OS algorithms and policies • Dust • Portable, user-level cache policy fingerprinting • Identifies FIFO, LRU, LFU, Clock, Random, 2Q, LRU-2 • Fingerprinted Linux 2.2 & 2.4, Solaris 2.7, NetBSD 1.5 & HP-UX 11.20 • Algorithmic Mirroring • Keep track of kernel state in user-space • Use this information to improve performance • Cache-Aware NeST • Uses mirroring to improved HTTP performance
Future Work • On-line, adaptive detection of cache policy • Policy manipulation • Make other applications cache aware • Databases • File servers (ftp, NFS, etc.) • Fingerprint other OS components • CPU scheduler • filesystem layout
Questions?? • Gray-Box Systems • http://www.cs.wisc.edu/graybox/ • Wisconsin Network Disks • http://www.cs.wisc.edu/wind/ • NeST • http://www.cs.wisc.edu/condor/nest/
F I F O Solaris 2.7 L R U L F U
F I F O HP-UX 11.20 (IPF) L R U L F U • Low recency areas are evicted • Low frequency areas also evicted • Conclusion: LRU with page aging
Related Work • Gray-Box (Arpaci-Dusseau) • Cache content detector • Connection Scheduling (Crovella, et. al.) • TBIT (Padhye & Floyd)
Clock - Uniform Reference Bits File Buffer Cache before test scan • After initial scan, cache state does not change • First half of test region is evicted Buffer Cache after test scan, before eviction scan
Clock - Random Reference Bits File Buffer Cache before test scan • Initial Sequential Scan • Test scan does not change cache state Buffer Cache after test scan, before eviction scan
Manipulating Reference Bits Buffer Cache after touching all resident data • Setting bits is easy • Clear bits by causing hand to do a circuit Buffer Cache after an additional small read