400 likes | 758 Views
Parity Lost and Parity Regained. Andrew Krioukov, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison. Garth R. Goodson, Kiran Srinivasan, Randy Thelen. Bare-bones RAID. Stripe data across multiple drives
E N D
Parity Lost and Parity Regained Andrew Krioukov, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin - Madison Garth R. Goodson, Kiran Srinivasan, Randy Thelen
Bare-bones RAID • Stripe data across multiple drives • Store redundant parity data • Can reconstruct data with any single disk failure • Will RAID protect data in all single failure cases? Data 1 Data 2 Data 3 Parity A B C P(ABC)
Bare-bones RAID Problems • Stripe contains file ABC consisting of 3 blocks • RAID has redundancy to recover data • RAID does not detect corruption Data 1 Data 2 Data 3 Parity RAID Stripe P(ABC) A A B B @#$% @#$% C Corruption Read file ABC Return Corrupt File
Bare-bones RAID Problems • RAID cannot detect partial disk failures: • Corruptions • Torn writes • Lost writes • Misdirected writes • RAID only protects against • Complete disk failures • Errors reported by the disk (e.g. Latent Sector Errors)
Data Protection Techniques • Need improvements to bare-bones RAID • Techniques needed to help detect errors • Checksums are common • Many kinds: block, sector, parent checksums • Which type of checksums are used? • We examined real systems to determine protection schemes
Enterprise RAID Systems • Mixed bag of protections
Question • Which errors do these systems protect against? • How can we ensure complete data protection? • Need method to identify all corruption & data loss scenarios in a design
Model Checking Solution • Create a model of storage system design using primitives • Checker exhaustively searches space of all possible states • Start with clean RAID stripe • Apply single disk error • Apply any number of disk operations (e.g. write) • Identifies all possible data loss scenarios
Results Summary • Applied model checking on enterprise RAID system designs • For all designs, a single error can cause data loss • Identified a common problem, parity pollution • Partial disk failure goes undetected • The erroneous data is used to compute parity • Recovery is no longer possible • Presented a design that protects against all single failures
Outline • Introduction • Background: Storage Errors • Model Checking Approach • Data Protection Design & Analysis • Conclusion
Storage Errors • Latent Sector Errors • Data is inaccessible • Explicit error code returned • Affect 19% of nearline, 2% of enterprise disks in 2 years [Bairavasundaram et al. SIGMETRICS’07] • Corruptions • Data is silently corrupted • Affect 0.6% of nearline and 0.06% of enterprise disks in 17 months [Bairavasundaram et al. FAST’08] • Reality: Partial disk failures happen
Storage Errors (Cont’d) • Torn Write • Only part of a block is written • Some sectors are lost • Write returns success code • Lost Writes • Write returns success code • Data not reflected on disk A Write B Success B A Write B Success
Storage Errors (Cont’d) • Misdirected Writes • Write goes to wrong location(either wrong block or wrong disk) • Combination of lost writeand corruption A B A’ Overwrite A A’ Success
Outline • Introduction • Background: Storage Errors • Model Checking Approach • Data Protection Design & Analysis • Conclusion
Modeling Storage System • Use primitives to describe: • On disk layout in terms of sectors • Data protections • Checker uses built-in models: • Storage errors • Disk operations (e.g. Read/Write) • Basic RAID functionality
Model Checking • Assumptions • Single RAID stripe • Single storage error • Single parity protection • Data disks are interchangeable • Apply error followed by any number of disk operations • Generate state diagram with all data loss states
State Diagram Example • Bare-bones RAID state diagram Parity Error Corrupt(p), Torn(p), Lost(p), Misdir(p) Wadd() Wsub(x+) Clean Corrupt(x), Torn(x), Lost(x), Misdir(x) R(x) Disk x Error Corrupt Data Wadd(x+) Wadd(!x) R(x) W(x+) Polluted Parity
Outline • Introduction • Background: Storage Errors • Model Checking Approach • Data Protection Design & Analysis • Conclusion
Data Protection Design • Need fault tolerance for all partial failures • Bare-bones RAID handles latent sector errors and complete disk failures • Corruption is next most common failure • Add protections cumulatively until design has complete protection
Protections Protections in red will be discussed in the talk • Scrubbing • Sector checksums • Block checksums • Parental checksums • Write verify • Physical identity • Logical identity • Version mirroring
Checksums A • Checksum per data block • Checksum per sector • Parent checksum • Checksum stored in parent inode cksum(A) a1 ck(a1) a2 A ck(a2) …
Checksum Example • Corruption scenario is now fixed Data 1 Data 2 Data 3 Parity @#$% @#$% C A A B B P(ABC) P(ABC) cksum(C) cksum(C) cksum(A) cksum(B) cksum(P) Corruption Perform reconstruction C C Read file ABC File is valid
Checksum Problems • Great for protecting against corruption errors • Fails to protect when data and checksum are lost together: • Lost write (with any type of checksums) • Torn write (only with sector checksums) • Parity pollution can occur
Checksum Problems – Lost Write • Block checksums Data 1 Data 2 Data 3 Parity C C P(ABC’) A B P(ABC) cksum(C) cksum(C) cksum(A) cksum(B) cksum(P) Lost Write Overwrite C→C’ Read file ABC’ Return data (ABC) Return Corrupt Data (C instead of C’)
Write Verify • Attempt to solve lost write problem • Costly solution, expect good protection • Procedure: • Write data to disk • Read back to verify • If lost write detected, write again or remap to new location C C’ cksum(C) cksum(C’) Lost Write Overwrite C→C’ Read back (C) Lost write detected, write C’ again Success
Write Verify Problems • Protects against lost writes • Susceptible to misdirected writes • Cannot detect/recover the overwritten data
Write Verify – Misdirected Write Data 1 Data 2 Data 3 Parity X X’ Y Z P(XYZ) P(X’YZ) Misdirected Write P(ABC) A X’ X’ B B C C cksum(P) cksum(A) cksum(X’) cksum(B) cksum(C) Initially… Overwrite X→X’ Read back X Lost, Re-write X Later… Read file ABC Return Corrupt Data (A has been corrupted)
Physical Identity • Protection against misdirected writes • Store disk & block number of destination in each block Misdirected Write A 1 Data, Block Number Overwrite Block 1: A A’ B 2 A’ 1 Read Block 2 Returned (A’, 1) Block num does not match (1≠2)Misdirected Write Detected
Problem Solved? • Write verify with block checksums and physical identity offers complete protection • But… twice the I/O cost! • Need a more efficient solution
Logical Identity • Less expensive protection against lost writes • Store file identifier (e.g. inode number) in each data block • Test that file identifier matches on a read A A cksum(A) File 0 File 0 Overwrite File 0 with File 1 (X) Lost Write Read File 1 Logical ID does not match.Lost Write Detected
Logical Identity Problem • Cannot be verified when re-computing parity • Not reading a file • Parity pollution may occur
Parity Pollution Example Data 1 Data 2 Data 3 Parity What should be on the disk P(A’B’C’) A File0 B File0 C File0 P(ABC’) P(ABC) A’ File2 B’ File2 C’ File1 C A’ B’ C C P(ABC’) A B P(ABC) cksum(A’) cksum(B’) cksum(C) cksum(A) cksum(B) cksum(P) File 0 File 0 File 0 File 2 File 0 File 2 Write File 1 Lost Write New Parity C→C’, Later… Write File 2 Parity consistent with invalid data Overwrite AB →A’B’ Parity: Read Data 3 A’ B’ P(A’B’C) P(A’B’C) Later… Read File 1 Logical ID mismatch (File 0 ≠ File 1) Report Data Loss Reconstruct… Data is consistent!
Version Mirroring • Lost write protection • Verifiable at RAID level • Store a version number in each data block • Mirror the version numbers on parity disk • Versions numbers verified on read C A B P(ABC) cksum(C) cksum(A) cksum(B) cksum(P) Ver0 Ver0 Ver0 0,0,0
Parity Pollution Solved Data 1 Data 2 Data 3 Parity What should be on the disk P(A’B’C’) A Ver0 B Ver0 C Ver0 P(ABC’) P(ABC) C’ Ver1 A’ Ver1 B’ Ver1 C C’ A’ B’ C P(ABC’) A B P(ABC) cksum(C’) cksum(A’) cksum(B’) cksum(C) cksum(A) cksum(B) cksum(P) Ver1 Ver0 Ver0 Ver1 Ver0 Ver 1 Ver0 1,1,1 0,0,1 0,0,1 0,0,0 Write File 1 Lost Write New Parity C→C’, Later… Write File 2 Overwrite AB →A’B’ Parity: Read Data 3 Version mismatch Reconstruct Data 3 C’ C’ A B P(ABC’) A’ B’ P(A’B’C’) P(A’B’C’)
Problem Solved… Efficiently • Version mirroring with block checksums and physical identity provides complete protection • Use with logical identity for efficiency • More efficient than write verify
Conclusion • Applied model checking on real system designs • For all designs, a single error can cause data loss • Parity pollution is a common problem • Version mirroring is a key technique to offering complete and efficient data protection • Partial failures are complex, no obvious data protection solution • Model checking is useful
ADvanced Systems Laboratorywww.cs.wisc.edu/adsl Advanced Technology Grouphttp://www.netapp.com/company/research/