210 likes | 315 Views
The TickerTAIP Parallel RAID Architecture. P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs. RAID Architectures. Traditional RAID architectures have A central RAID controller interfacing to the host and processing all I/O requests Disk drives organized in strings
E N D
The TickerTAIPParallel RAID Architecture P. Cao, S. B. LimS. Venkatraman, J. WilkesHP Labs
RAID Architectures • Traditional RAID architectures have • A central RAID controller interfacing to the host and processing all I/O requests • Disk drives organized in strings • One disk controller per disk string (mostly SCSI)
Limitations • Capabilities of RAID controller are crucial to the performance of RAID • Can become memory-bound • Presents a single point of failure • Can become abottleneck • Having a spare controlleris an expensive proposition
Our Solution • Have a cooperating set ofarray controller nodes • Major benefits are: • Fault-tolerance • Scalability • Smooth incremental growth • Flexibility: can mix and match components
TickerTAIP Hostinterconnects Controller nodes
TickerTAIP ( I) A TickerTAIP array consists of: • Worker nodes connected with one or more local disks through a bus • Originator nodes interfacing with host computer clients • A high-performance small area network: • Mesh based switching network (Datamesh) • PCI backplanes for small networks
TickerTAIP ( II) • Can combine or separate worker and originator nodes • Parity calculations are done in decentralized fashion: • Bottleneck is memory bandwidth not CPU speed • Cheaper than having faster paths to a dedicated parity engine
Design Issues (I) • Normal-mode reads are trivial to implement • Normal mode writes: • three ways to calculate the new parity: • full stripe: calculate parity from new data • small stripe: requires at least four I/Os • large stripe: if we rewrite more than half a stripe, we compute the parity by reading the unmodified data blocks
Design Issues (II) • Parity can be calculated: • At originator node • Solely parity: at the parity node for the stripe • Must ship all involved blocks to party node • At parity: same as solely parity but partial results for small stripe writes are computed at worker node and shipped to parity node • Occasions less traffic than solely parity
Handling single failures (I) • TickerTAIP must provide request atomicity • Disk failures aretreated as in standard RAID • Worker failures: • Treated like disk failures • Detected by time-outs(assuming fail-silent nodes) • A distributed consensus algorithm reaches consensus among remaining nodes
Handling single failures (II) • Originator failures: • Worst case is failure of a originator/worker node during a write • TickerTAIP uses a two-phase commit protocol: • Two options: • Late commit • Early commit
Late commit/Early commit • Late commitonly commits after parity has been computed • Only the writes must be performed • Early commitcommits as soon as new data and old data have been replicated • Somewhat faster • Harder to implement
Handling multiple failures • Power failures during writes can corrupt stripe being written: • Use UPS to eliminate them • Must guarantee that some specific requests will always be executed in a given order: • Cannot write data blocks before updating the i-nodes containing block addresses • Usesrequest sequencingto achieve partial write ordering
Request sequencing (I) • Each request • Is given a unique identifier • Can specify one or more requests on whose previous completion it depends(explicit dependencies) • TickerTAIP adds enough implicit dependencies to prevent concurrent execution of overlapping requests
Request sequencing (II) • Sequencing is performed by acentralized sequencer • Several distributed solutions were considered but not selected because of the complexity of the recovery protocols they would require
Disk Scheduling Not discussed in class in Fall 2005 • Considered • First come first served (FCFS): implemented in the working prototype • Shortest seek time first (SSTF): • Shortest access time first (SATF):Considers both seek time and rotation time • Batched nearest neighbor (BNN):Runs SATF on all reuests in queue
Evaluation (I) • Based upon • Working prototype • Used seven relatively slow Parsytec cards each with its own disk drive • Event-driven simulator was used to test other configurations: • Results were always within 6% of prototype measurements
Evaluation (II) • Read performance: • 1MB/s links are enough unless the request sizes exceed 1MB
Evaluation (III) • Write performance: • Large stripe policyalways results in aslight improvement • At-parity significantly better than at-originator especially for link speeds below 10MB/s • Late commit protocol reduces throughput by at most 2% but can increase response time by up to 20% • Early commit protocol is not much better • TickerTAIP always outperforms a comparable centralized RAID architecture • best disk scheduling policy is Batched Nearest Neighbor
Evaluation (IV) • TickerTAIP always outperforms a comparable centralized RAID architecture • Best disk scheduling policy is Batched Nearest Neighbor(BNN)
Conclusion • Can use physical redundancy to eliminate single points of failure • Can use eleven 5 MIPS processors instead of single 50 MIPS • Can use off-the-shelf processors for parity computations • Disk drives remain the bottleneck for small request sizes