170 likes | 183 Views
Explore how gray-box techniques meet storage systems, focusing on smarter disks and RAIDs, data demands on the rise, and traditional vs. grid storage challenges. Learn about data reliability, management, caching, overlap, and evaluation strategies in both traditional and grid environments. Discover how to leverage knowledge of workloads and systems to address complex storage problems effectively.
E N D
Storage Research MeetsThe Grid Remzi Arpaci-Dusseau
ADSL • Where Gray-box techniques meet storage systems Gray Box Storage
The Who, How, and What of ADSL • Who: Andrea and Remzi Arpaci-Dusseau • And of course a bunch of students • How: Gray-box Techniques • Assume system is a “gray box” • Leverage knowledge of its implementation to: • Gain more information • Control its behavior • What: Storage Systems • Smarter disks and RAIDs
Semantically-smart Disks • Problem: Most disks don’t know much • Block-based SCSI interface limits knowledge • And what a waste of potential! • Modern RAIDs have substantial processing, memory • A semantically-smart disk system • Figure out how file system is using it • Exploits that to build new functionality into storage
Trend that Drives This Session: Data Demands on the Rise • Focus of original batch queueing systems: CPU • “cycle stealing” • Compute clusters • Distributed supercomputer • But data demands of jobs are on the rise… • Input, output, temp files and checkpoints • Modern science is increasingly data centric
Focus of this talk: Traditional storage vs. Grid storage • Most aspects of modern storage systemsare designed with certain domain in mind • Local area environment, presence of admin, etc. • Grid changes almost every assumption • Wide area, no admin, etc. • Conclusion: Must reexamine how to build storagesystems from the ground up
Outline • Introduction • Traditional vs. Grid Storage • Data reliability • Management • Caching and Overlap • Evaluation • Conclusions
Data Reliability: Traditional • All data treated equally, and is sacred • Most users tolerate some amount of data loss(30 second delay before flush to disk) • Losing one byte after flush is catastrophic • Strong implications for design:Backup + disaster recovery
Data Reliability: Grid • Different types of I/O, treat accordingly • Einstein’s Matter-Energy equivalence: E=MC^2 • Grid analogy: Data-Computation equivalence • E(M) = C • Knowledge is key: If you can refetch M, you can recompute C
Management: Traditional • Storage administrators control system • Performance tuning • Problem fixing • User handling • Human intelligence can be applied to makethings run smoothly
Management: Grid • No administrator to help out • Though may have to live within administrative limitations • System must automatically handle problems • Tune to environment • Deal with failures • Give reasonable feedback to usersupon errors and other problem scenarios
Buffering and Overlap: Traditional • Used throughout systems for performance • Important cache: Client-side • NFS: Memory • AFS: Disk (and memory) • Caches are managed transparently • Overlap: Disk->memory, across network, also transparent • Result: Operations can run as if they are local Client Server $ $
$ $ $ Buffering and Overlap: Grid • Used throughout for performance, reliability • Many more levels of cache • Not just clients/servers • Caches managed both transparentlyand not transparently • Overlap is more complex too(multiple users, resources) • Have to deal with more issues:failure, cost differentials WAN $ HomeSite
Evaluation: Traditional • Traditional storage metrics: Myopic focus • May miss “big picture” • One example: Availability • Defined as “uptime” of system • What’s good: “5 9s” of availability (up 99.999%) • Implications: • Systems are engineered for enterprise use(and thus over-engineered for many uses)
Evaluation: Grid • Grid metrics can focus on what’s importantfor Grid jobs: Job throughput • Instead of availability, measure impact of failureon the aspect of system that matters most • Result: An end-to-end perspective to evaluatemerit of new approaches in the Grid space
Summary • Grid changes storage systems • Makes some things harder(caching, overlap, failures) • Makes other things easier(better understanding of workload and metrics) • How to make it all work? • Exploit knowledge: of workloads and systemsto reduce difficult problems to tractable ones
The Data-centric Lineup • Lots of exciting work going at Wisconsin in this space! • First session: • John Bent - “Batch-pipelined Workloads” • Doug Thain - “Migratory File Services” • Second Session • Joseph Stanley - “NeST” • Tevfik Kosar - “Stork” • George Kola - “Disk Router” • Guest speaker: Arie Shoshani - “Coscheduling Storage and CPUs”