250 likes | 496 Views
Differentiated Storage Services. Tian Luo The Ohio State University. Michael Mesnier, Jason Akers, Feng Chen Intel Corporation. 23rd ACM Symposium on Operating Systems Principles (SOSP) October 23-26, 2011, Cascais , Portugal . Technology overview. An analogy: moving & shipping.
E N D
Differentiated Storage Services Tian Luo The Ohio State University Michael Mesnier, Jason Akers, Feng Chen Intel Corporation 23rd ACMSymposium on Operating Systems Principles (SOSP) October 23-26, 2011, Cascais, Portugal
Technology overview An analogy: moving & shipping Classification Policy assignment Policy enforcement Why should computer storage be any different?
Differentiated Storage Services Technology overview (offline) Classify each I/O in-band Classification Policy assignment Policy enforcement Storage system Computer system Management firmware Applications or DB QoS Policies I/O Classification Storage Pool A Storage Pool B Storage Pool C Operating system I/O Classification Storage controller File system I/O Classification QoS Mechanisms = Current & future research
The SCSI CDB 5 bits 32 classes
Motivation: disk caching with SSDs Universal challenges in the industry Keeping the right data cached Avoiding thrash under cache pressure Conventional approaches Cache bypass for large/sequential requests Evict cold data (LRU commonly used) How I/O classification can help Identify cacheable I/O classes Assign relative caching priorities Technology overview
Filesystem prototypes (Ext3 & NTFS) Technology overview Classify each I/O in-band FS classification FS policy assignment FS policy enforcement Storage system Computer system Management firmware Applications or DB QoS Policies I/O Classification Disk SSD Operating system I/O Classification Storage controller File system QoS Mechanisms I/O Classification = Current & future research
Database prototype (PostgreSQL) Technology overview Classify each I/O in-band DB classification DB policy assignment DB policy enforcement Storage system Computer system Management firmware Applications or DB QoS Policies I/O Classification Disk SSD Operating system I/O Classification Storage controller File system QoS Mechanisms I/O Classification = Current & future research
Technology overview Selective cache algorithms • Selective allocation • Always allocate high-priority classes • E.g. FS metadata and DB system tables always allocated • Conditionally allocate low-priority classes • Depends on cache pressure, cache contents, etc. • High/low cutoff is a tunable parameter • Selective eviction • Evict in priority order (lowest priority first) • E.g., temporary DB tables evicted system tables • Trivially implemented by managing one LRU per class
Technology development Ext3 prototype • OS changes (block layer) • Add classifier to I/O requests • Only coalesce like-class requests • Copy classifier into SCSI CDB • Ext3 changes • 18 classes identified • Optimized for a file server • Small files & metadata • A small kernel patch • A one-time change to the FS
Technology development Ext3 classification illustrated • echo ‘Hello, world!’ >> foo; sync • READ_10(lba 231495 len 8 grp 9) <=4KB • WRITE_10(lba 231495 len 8 grp 9) <=4KB • WRITE_10(lba 16519223 len 8 grp 8) Journal • WRITE_10(lba 16519231 len 8 grp 8) Journal • WRITE_10(lba 16519239 len 8 grp 8) Journal • WRITE_10(lba 16519247 len 8 grp 8) Journal • WRITE_10(lba 8279 len 8 grp 5) Inode • 7 I/Os (28KB) to write 13 bytes • Metadata accounts for most of the overhead I/O classification shows read-modify-write and metadata updates NTFS classification is implemented with Windows filter drivers
Technology development PostgreSQL prototype • Classification API: scatter/gather I/O • OS changes (block layer) • Add O_CLASSIFIED file flag • Extract classifier from SG I/O • A small OS & DB patch • A one-time change to the OS & DB fd=open("foo", O_RDWR|O_CLASSIFIED, 0666); class = 19; myiov[0].iov_base = &class; myiov[0].iov_len = 1; myiov[1].iov_base = “Hello, world!”; myiov[1].iov_len = 13; writev(fd, myiov, 2); Preliminary DB classes
Technology development Cache implementations • Fully associative read/write LRU cache • Insert(), Lookup(), Delete(), etc. • Hash table maps disk LBA to SSD LBA • Syncer daemon asynchronously cleans cache • Monitors cache pressure for selective allocate • Maintains multiple LRU lists for selective evict • Front-ends: iSCSI (OS independent) and Linux MD • MD cache module (RAID-9) Striping: mdadm –create /dev/md0 –level=0 –raid-devices=2 /dev/sdd /dev/sde Mirroring: mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sdd /dev/sde RAID-9: mdadm –create /dev/md0 –level=9 –raid-devices=2 <cache> <base
Evaluation Experimental setup • Host OS (Xeon, 2-way, quad-core, 12GB RAM) • Linux 2.6.34 (patched as described) • Target storage system • HW RAID array + X25-E cache • Workloads and cache sizes • SPECsfs: 18GB (10% of 184GB working set) • TPC-H: 8GB (28% of 29GB working set) • Comparison • LRU versus LRU-S (LRU with selective caching)
SPECsfs I/O breakdown LRU LRU-S Large files pollute LRU cache (metadata and small files evicted) LRU-S fences off large file I/O
SPECsfs performance metrics Hit rate Running time 1.8x speedup HDD LRU LRU-S LRU LRU-S Syncer overhead I/O Throughput LRU LRU-S LRU LRU-S
SPECsfs file latencies Reduction in write latency over HDD Reduction in read latency over HDD LRU LRU-S LRU LRU-S LRU LRU-S LRU suffers from write outliers (from eviction overheads) LRU-S reduces read latency (most small files are cached)
TPC-H I/O breakdown LRU LRU-S Indexes pollute LRU cache (user tables evicted) LRU-S fences off index files
TPC-H performance metrics Hit rate Running time 1.2x speedup HDD LRU LRU-S LRU LRU-S Syncer overhead I/O Throughput LRU LRU-S LRU LRU-S
Conclusion & future work • Intelligent caching is just the beginning • Other types of performance differentiation • Security, reliability, retention, … • Other applications we’re looking at • Databases • Hypervisors • Cloud storage • Big Data (NoSQL DB) • Work already underway in T10 • Open source coming soon… Thank you! Questions?