310 likes | 323 Views
The HP AutoRAID Hierarchical Storage System. John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan. Presented by Arthur Strutzenberg. What is RAID. This stands for (R)edundant (A)rray of (I)ndependent (D)isks
E N D
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Presented by Arthur Strutzenberg
What is RAID • This stands for (R)edundant (A)rray of (I)ndependent (D)isks • RAID can be configured according to many different levels. For this paper we will worry about • Level 3 • Level 5 • You have two goals with a RAID array • Performance Boosting • Data Redundancy
Data Redundancy (Mirroring) • Mirroring is a complete copy of all of the data • Provides reliability • The cost is a high storage overhead
Mirroring at the different levels • RAID 5 works by interleaving blocks of data and parity information through the disk. This alleviates the performance bottleneck of having a single disk devoted to it. • Raid 3 works by interleaving data across a set of disks, and stores parity information on a separate “parity” data disk
The Challenges of RAID • This technology is difficult to use • Different RAID levels have different performance characteristics • Typically there are a LOT of configuration parameters • Wrong Choices in the configuration usually result in • Poor Performance • Having to rebuild the array (a daunting task)
Combining Level 5 & Mirroring • This paper explores what happens when you combine elements from Mirroring with that of Level 5 • Gain the performance advantages of mirroring • Gain the advantage of the read rates of Level 5 • Gain the write advantages of Level 3 • Assumption #1: It is possible to divide the filesystem between active and inactive data • Assumption #2: The active subsets must change relatively slowly
Implementation Levels • As with any OS design, Managed Storage can occur at many different levels • Manually by the Sysadmin • In the FileSystem • At the Hardware (Controller) level
Manual by the Sysadmin • Advantage of bringing the human element into the problem • Disadvantage of bringing the human into the problem
Implementation: FileSystem Level • Advantage of offering knowledge about the file system • Disadvantage that there are many file system implementations out there
At the Controller Level • Advantage is that this is easily deployable-in the case of HP AutoRAID, the system looks like a POD (Plain ‘ol Drive) • Disadvantage that you lose knowledge about the physical location of data on disk
Features of HP AutoRAID • In the design of AutoRAID the authors attempted to provide a system with • (Transparent) Mapping • Mirroring • Adaptation to storage changes • Hot pluggable • Storage Expansion (Adding a new disk) • Disk Upgrades (Switching out a disk) • Controller Failover (Redundancy) • Simple Administration & Setup • Log Structured RAID 5 Writes • In other words they wanted a product that would be easy to convert into something that was eventually salable!
Bringing in other Papers • Part of the approach for AutoRAID uses the idea from most Virtual Memory Management systems • You have a single “address space” on the file system • However a lot of complicated calculations and work that goes on in the background
Looking at this in layers • If we were to look at this in a layered approach, there are two ways to examine the system
Data Layout • The AutoRAID system is organized by the following logical units • PEX (P)hysical (EX)tents • The large granual sized data chunk that divides each disk • PEG (P)hysical (E)xtent (G)roup • Several PEX’s can be combined together to form one PEG • 3 states: each PEG is either mirrored, RAID 5, or unallocated • Segments • Units of contiguous space on a drive included in a stripe or mirrored pair • Describes the Stripe unit in RAID 5, or the unit of duplication in a mirrored pair • RB (R)elocation (B)locks • The unit of migration used by the system • LUN (L)ogical (U)nit • The visible disk viewable by the host
Data Layout at the various layers • RAID 5 uses a log system approach • Due to this, it needs to have a lot of free space to write out its logs • This necessitates the need for a cleaner that works similarly to the system presented in the paper last week • This system does add some complexity to this process, due to the nature of the parity data and when it is written
Mapping Structure • AutoRAID makes use of a virtual device table • This table maps RB elements to the PEG’s in which they reside • The PEG table holds a list of RB’s in the PEG and the PEXes used to store them • Finally you have the PEX tables. (one per drive) • This all goes on behind the scenes, just like a VMM system
Reading and Writing • Here is where it gets interesting • The autoraid system makes use of a cache system like most standard computer systems • However this system divides the cache between standard DRAM (for read cache), and NVRAM (for write cache) • Reads first check the cache to see if the data resides there. A cache miss results in a read request that is dispatched to the back end storage • Using NVRAM for write cache allows for several things • This host system can consider a write request as “complete” once a copy of the data to write has been written into this memory. • This allows the autoraid system to ameliorate their cost of multiple writes by combining them together into one monolithic write
Mirrored Reads & Writes • Both Reads & Writes to the mirrored storage are straightforward • Read picks up a copy of the data from one of the disks • Writes Requests will generate a write to both disks, and the request only returns once both disks have been updated • THIS IS INDEPENDENT OF A WRITE PERFORMED BY A HOST
RAID 5 Reads & Writes • Reads to RAID 5 are just as straightforward • Writes are where things get a tad more complex • The AutoRAID RAID 5 layer makes use of a log approach and happen • Per RB, • As a batch • There are times as well where the RAID 5 implementation makes use of In Place writes instead of the logged approach (more about this later)
Background Operations • Because of the nature of this system there are several background operations occur to keep the array healthy and balanced • Compaction, Cleaning & Hole Plugging • In the case of both Mirrored, and RAID 5, the system suffers from fragmentation. This process identifies the holes, and compacts the system in order to generate fresh unfragmented resources for use. • Migration • The crux of this system involves moving more elderly data that has not been modified into RAID 5. • This is done as a house cleaning step • This is also performed due to the “bursty” nature of most writes. This system ensures that a minimum threshold is kept within the mirrored space • Balancing • This is the process of migrating PEX’s between disks to equalize data load. This process is necessary because of the dynamic nature of the disk system
Does it work? Test Results • Test results were garnered from a combination of • Prototype testing • Simulation • When possible comparisons were made against two basic controls • A standard RAID setup that used RAID Level 5 • Individual Disks (aka JBOD or Just a Bunch of Disks)
Transaction Rates (Macrobenchmarks) • The test was a database workload on an OLTP Database that comprised a series of medium weight transactions • Graph one compares AutoRAID to the control groups • Graph two compares what happens when you add more disks into the system
Simulation Results • In this case the Simulations compared several systems • Cello, an HP-UX time sharing system • Snake, a clustered file server • OLTP benchmark tests • Hplajw a workstation and netware server • Overall there were hundreds of experiments that the authors could simulate. What follows is only a few • Consequences of adjusting disk speed • Consequences of adjusting the RB Size • Consequences of poor data layout • Consequences of various read disk policies • Write Cache Overwrites • RB demotion via standard (log approach) vs demotion using hole plugging
Write Cache Overwrites and Hole Plugging • The normal mode of operation for AutoRAID is to log RP demotions using a standard log process. Changing the demotion process to a hole plugging process had the effect of • Reducing the RP’s moved by the cleaner by 93% for the cello/usr workload, and 96% for the snake workload • Improved mean i/o time for the user i/o’s by 8.4% and 3.2% respectively
Conclusions • The authors indicated that what they had was a working prototype • They were able to prove that such a system is workable, and demonstrated that it approached the performance of the standard JBOD system • This disk system approach hearkens back to the days of the Commodore. The drive systems on the Commodore brand machines generally could be considered a computer system in and of themselves