240 likes | 320 Views
IO-Lite: A Unified Buffering and Caching System. By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching Professor Zhang (Summer 2005). Outline. Problem & Significance Literature Review Proposed Solution
E N D
IO-Lite:A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching Professor Zhang (Summer 2005)
Outline • Problem & Significance • Literature Review • Proposed Solution • Design, Implementation, & Operation • Experimental Design • Results • Conclusion • Further Research
The Problem • The I/O subsystem and various applications all tend to use their own private I/O buffers • Redundant data copying • Multiple buffering • Lack of cross-subsystem optimization
Problem’s Significance • Wastes memory • Reduces space available for caching • Causes higher cache miss rates • High CPU overhead • Limits server throughput
Literature Review • POSIX I/O -Problem: • double-buffering • Memory-mapped files (mmap) -Problem: • Not generalized to network I/O
Literature Review • Transparent Copy Avoidance -Problem: • VM page alignment problems • Copy-on-write faults • Genie (emulated copy) • Lack of full transparency leads to same problems • Copy Avoidance with Handoff Semantics -Problem: • Lack of concurrent sharing reduces effectiveness
Literature Review • Fast buffers (fbufs) • Designed by Druschel -Problem: • Does not support filesystem access, or a file cache • Extensible kernels -Problem: • More overhead, not OS-portable
IO-Lite Solution • Unified buffering and caching • Allow all applications and subsystems share the same buffered I/O data • Very simple at face value, very complex to implement
Basic Design • Immutable buffers • Initial allocated data cannot be modified • Effectively read-only sharing Advantages? • Eliminates synchronization and protection problems Disadvantages? • I/O data cannot be modified in place
Further Design Considerations To make up for immutable buffers: • Create buffer aggregate abstraction (an ADT) • mutable • Reference to IO-Lite Window in VM • Aggregates contain ordered list of form <address, length> • Aggregates passed by value • Buffers passed by reference
Further Design Considerations • Buffer sharing must be concurrent • To achieve this, use similar method to fbufs • Expand to include the filesystem • Adapts for general purpose OS • Worst case scenario (in terms of overhead): • Page remapping • (when last buffer is allocated before first is deallocated)
IO-Lite Implementation • New read & write API which supersedes the regular read & write • size_t IOL_read(int fd, IOL_Agg **aggr, size_t size); • size_t IOL_write(int fd, IOL_Agg *aggr); • IOL_Agg is buffer aggregate data type • Both operations are atomic
IO-Lite Implementation • Applications: • Recommends implementation in runtime I/O Libraries to avoid modifying all programs • Filesystem: • File cache data structure: <file-id, offset, length> • Network: • Need to modify network device drivers to allow early demultiplexing (using a packet filter)
IO-Lite Operation • With regards to the cache: • Cache replacement basically LRU • Allows for application customization • Cache eviction controlled by VM daemon • Do >½ replaced pages contain I/O data?
IO-Lite Operation • Impact of immutable buffers: • Case 1: Entire object is modified • Lack of in-place modification has no ill effect • Case 2: Subset of object needs to be modified • Rather than recopy entire object, use chaining • Performance loss is small if blocks are localized • Case 3: Scattered subset needs modification • IO-Lite incorporates mmap interface for this
Experimental Design • Compared: • Apache 1.3.1 • Widely used web server • Flash (event-driven HTTP server) • Designed by authors in previous year • Flash-Lite (Flash modified to use IO-Lite API) • New design by authors
Experimental Design • General: varied requested file size • 40 requests for same file • File size ranged from 500 bytes – 200 Kbytes • Persistent connections • Reduces overhead • CGI • Additional I/O traditionally slows servers
Experimental Design • Real workloads • Shows performance benefits by allowing more space for caching • Based on Rice’s CSCI department logs • Wide Area Network (WAN) • Test throughput with 0-256 slow clients connecting • Applications • Incorporated API into UNIX programs
Results • General test: • Bandwidth increase of 43% over Flash, 137% over Apache • No real difference for files less than 5KBytes
Results • Persistent Connections • Flash-Lite even more effective at smaller file sizes • CGI • All servers slow, but Flash-Lite still much better • Real workload • Flash-Lite throughput 65% greater than Apache • WAN • Flash-Lite does not suffer from slow clients • Applications • Varied improvement for all programs tested
Conclusion • IO-Lite consistently improved performance in all contexts tested • Requires modification to numerous libraries and network device drivers • EG: see Peng, Sharma, & Chiueh (2003)
Further Research • There have been 42 citations • Almost all fell between 2001-2003 • Authors have not written any follow-ups • Lack of papers that involve implementation of IO-Lite or a variation of it • Probably because of complexity and number of modifications that are necessary
Appendix: Figures 2) 4) 5) 3) 6)