150 likes | 160 Views
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES. D. Colarelli D. Grunwald U. Colorado, Boulder. Highlights. Paper proposes To replace tape libraries by large non-redundant arrays of disks To cache on active drives Files that have been recently accessed Update logs for other files
E N D
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder
Highlights • Paper proposes • To replace tape libraries by large non-redundant arrays of disks • To cache on active drives • Files that have been recently accessed • Update logs for other files • To keep other drives mostly inactive by spinning them down between accesses
Introduction (I) • Robotic tape libraries are now the standard solution for archiving very large amounts of data • Disadvantages include • Slow access times:average search time of 41s for T9940 drives • Not much cheaper than disk drives • Could we replace tem by massive arrays of hard drives?
Introduction (II) • Major limitation of hard drive solution is power consumption • Almost ten times that of equivalent tape library • Could power down disks that are not currently accessed • 50% of data are likely to be never accessed • 25% of data are likely to be accessed once
Introduction (III) • Must be at least as reliable as tape libraries • No need to use a redundant scheme • Solution is Massive Array of Inactive Drives • Paper investigates design issues through trace-driven simulations
Design Issues • Two major design decisions • Data migration or duplication (caching) • File system or block-level interface
Migration would move “hot” data to active drives Migration uses disk space more efficiently Requires a mapor directory mechanism that maps the storage across all drives Caching would cache read data and act as a write log for write data Keeps two copies of all cached files Maps or directories are proportional to size of cache Migration or caching
Could use file system information to cache entire files Would probably perform better Would require system modifications Would work with existing systems File system or block interface
MAID with caching Passive drives(spin up/down) Active drives (always on) Passive Drive Manager Cache Manager Virtualization Manager
Design choices (I) • Compared MAID-cache and MAID-no cache • MAID-cache • Caches read and writes on active drives • Caching unit is “chunk” of 64 sectors • Cache policy is LRU • All writes are placed in the cache write-log where they wait to be committed to the non-active (passive) drives
Design choices (II) • Must always check write log before reading data from the cache or the passive drives • Passive drives remain on standby until • A cache miss occurs • The write log becomes too long • Return to standby when spin-down inactivity time limit is reached • Varying time limit is primary way to affect system performance and energy consumption
Simulation parameters • Power management policy: • Always on • Fixed-delay spin-down • Adaptive spin-down • Data layout • Linear: keep successive blocks on same drive • Striped: the opposite • Caching/No caching
Simulation results • Based on a supercomputer center workload • All MAID configurations achieve similar power consumptions • 15 to 16 % of that of always on configuration • MAID configurations w/o cache have average response times comparable to that of always on configuration • Workload had little locality
Simulation results (II) • Average response times of MAID configurations with cache much worse than that of always on configuration • 0.680 to 0.720 s compared to 0.303 s • Striped configuration with fixed spin-down delay has lowest average response time of all MAID configurations • 0.309 s
Conclusion • MAID can achieve average response times comparable to that of an always on configuration with a much lower power consumption IMPORTANT In a more recent paper, the authors found out that cached configurations worked much better for workloads exhibiting more locality of accessesthan their supercomputer center workload