Bridging the Information Gap in Storage Protocol Stacks

Bridging the Information Gapin Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison

State of Affairs File System Storage System Namespace, Files, Metadata, Layout, Free Space Block Based, Read/Write Parallelism, Redundancy Interface

Problem • Information gap may cause problems • Poor performance • Partial stripe write operations • Duplicated functionality • Logging in file system and storage system • Reduced functionality • Storage system lacks knowledge of files • Time to re-examine the division of labor

Informed LFS Exposed RAID Our Approach • Enhance the storage interface • Expose performance and failure information • Use information to provide new functionality • On-line expansion • Dynamic parallelism • Flexible redundancy

Outline • ERAID Overview • I·LFS Overview • Functionality and Evaluation • On-line expansion • Dynamic parallelism • Flexible redundancy • Lazy redundancy • Conclusion

ERAID Goals • Backwards compatibility • Block-based interface • Linear, concatenated address space • Expose information to the file system above • Regions • Performance • Failure • Allow file system to utilize semantic knowledge

ERAID Regions • Region • Contiguous portion of the address space • Regions can be added to expand the address space • Region composition • RAID: One region for all disks • Exposed: Separate regions for each disk • Hybrid ERAID

ERAID Performance Information • Exposed on a per-region basis • Queue length and throughput • Reveals • Static disk heterogeneity • Dynamic performance and load fluctuations ERAID

ERAID Failure Information • Exposed on a per-region basis • Number of tolerable failures • Reveals • Static differences in failure characteristics • Dynamic failures to file system above ERAID X RAID1

I·LFS Overview • Log-structured file system • Transforms all writes into large sequential writes • All data and metadata is written to a log • Log is a collection of segments • Segment table describes each segment • Cleaner process produces empty segments • Why use LFS for an informed file system? • Write-anywhere design provides flexibility • Ideas applicable to other file systems

I·LFS Overview • Goals • Improve performance, functionality, and manageability • Minimize system complexity • Exploits ERAID information to provide • On-line expansion • Dynamic parallelism • Flexible redundancy • Lazy redundancy

I·LFS Experimental Platform • NetBSD 1.5 • 1 GHz Intel Pentium III Xeon • 128 MB RAM • Four fast disks • Seagate Cheetah 36XL, 21.6 MB/s • Four slow disks • Seagate Barracuda 4XL, 7.5 MB/s

I·LFS Baseline Performance • Four slow disks: 30 MB/s • Four fast disks: 80 MB/s

I·LFS On-line Expansion • Goal: Expand storage incrementally • Capacity • Performance • Ideal: Instant disk addition • Minimize downtime • Simplify administration • I·LFS supports on-line addition of new disks

I·LFS On-line Expansion Details • ERAID: Expandable address space • Expansion is equivalent to adding empty segments • Start with an oversized segment table • Activate new portion of segment table

I·LFS On-line Expansion Experiment • I·LFS immediately takes advantage of each extra disk

I·LFS Dynamic Parallelism • Goal: Perform well on heterogeneous storage • Static performance differences • Dynamic performance fluctuations • Ideal: Maximize throughput of the storage system • I·LFS writes data proportionate to performance

I·LFS Dynamic Parallelism Details • ERAID: Dynamic performance information • Most file system routines are not changed • Aware of only the ERAID linear address space • Reduces file system complexity • Segment selection routine • Aware of ERAID regions and performance • Chooses next segment based on current performance

I·LFS Static Parallelism Experiment • Simple striping limited by the rate of the slowest disk • I·LFS provides the full throughput of the system

I·LFS Dynamic Parallelism Experiment • I·LFS adjusts to the performance fluctuation

I·LFS Flexible Redundancy • Goal: Offer new redundancy options to users • Ideal: Range of mechanisms and granularities • I·LFS provides mirroredper-file redundancy

I·LFS Flexible Redundancy Details • ERAID: Region failure characteristics • Use separate files for redundancy • Even inode N for original files • Odd inode N+1 for redundant files • Original and redundant data in different sets of regions • Flexible data placement within the regions • Use recursive vnode operations for redundant files • Leverage existing routines to reduce complexity

I·LFS Flexible Redundancy Experiment • I·LFS provides a throughput and reliability tradeoff

I·LFS Lazy Redundancy • Goal: Avoid replication performance penalty • Ideal: Replicate data immediately before failure • I·LFS offers redundancy with delayed replication • Avoids replication penalty for short-lived files

I·LFS Lazy Redundancy • ERAID: Region failure characteristics • Segments needing replication are flagged • Cleaner acts as replicator • Locates flagged segments • Checks data liveness and lifetime • Generates redundant copies of files

I·LFS Lazy Redundancy Experiment • I·LFS avoids performance penalty for short-lived files

Comparison with Traditional Systems • On-line expansion • Yes • Dynamic parallelism (heterogeneous storage) • Yes, but with duplicated functionality • Flexible redundancy • No, the storage system is not aware of file composition • Lazy redundancy • No, the storage system is not aware of file deletions

Conclusion • Introduced ERAID and I·LFS • Extra information enables new functionality • Difficult or impossible in traditional systems • Minimal complexity • 19% increase in code size • Time to re-examine the division of labor

Questions? http://www.cs.wisc.edu/wind/

Bridging the Information Gap in Storage Protocol Stacks