1 / 18

Data Corruption in the Enterprise

Data Corruption in the Enterprise. Jim Williams HEPiX Fall 2007. Agenda. What is Data Integrity? End to End Data Integrity Existing E2E Data Integrity Limitations of Today’s E2E Data Integrity Future Work. What is Data Integrity?. Definition.

marianor
Download Presentation

Data Corruption in the Enterprise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Corruption in the Enterprise Jim Williams HEPiX Fall 2007

  2. Agenda • What is Data Integrity? • End to End Data Integrity • Existing E2E Data Integrity • Limitations of Today’s E2E Data Integrity • Future Work

  3. What is Data Integrity? Definition • Defined as the non-malicious loss of data resulting from component failure (hardware/software) or inadvertent administrative action • Low-frequency, high-impact

  4. What is Data Integrity? Causes • Operating System bugs • Core O/S • Device drivers • Storage hardware and firmware bugs • HBAs • Arrays • Disks • Administrative errors • System administrators • Database administrators

  5. What is Data Integrity? What happens after data corruption? • Find a “good” copy of the lost data… • Block recovery, usually from offline storage • Requires highly trained skill-set • Often involves extended downtime • High intensity situation, best avoided What if it could have been avoided?

  6. What is Data Integrity? Remediation • Adding protection metadata to data • Universally done at component level • CRC, Parity, Reference Tags • Proprietary protection metadata • T10 Protection Information Model standard provided for protection metadata across system components

  7. What is Data Integrity? Why is storage data corruption different? • TCP end to end data integrity is pretty good • Applications can use proprietary means for end to end data integrity • Applications do not control both ends between application and storage • Short-term application failures are much less costly than data loss

  8. What is Data Integrity? • At the storage level, there are two kinds of data corruption • Latent sector errors • Silent data corruption • It is usually the case that for a storage device perspective, it is better to not return data than return the wrong data

  9. Existing E2E Data Integrity E2E Data Integrity prevents, not simply detects corruption • The checksum in an Oracle data block, by itself, only allows the Oracle RDBMS, to detect when the data block is read, that something in the storage stack corrupted data. • However, if the storage device understands the Oracle data block structure, then the storage device can prevent corrupted data from being PERMANTELY written! This is the idea behind Oracle HARD

  10. Existing E2E Data Integrity Oracle HARD

  11. Existing E2E Data Integrity T10 Protection Information Model

  12. Existing E2E Data Integrity Comparison

  13. Limitations of Today’s E2E Data Integrity • T10 • Does not span to application • Does not address host oriented failures • Computational expensive to implement on host • Oracle HARD • Does not span to disk drive • Proprietary • Oracle oriented (DB block structure)

  14. Future Work • Data Integrity Initiative (DII) • Oracle, Emulex, LSI, Seagate come together to address the problem of data corruption • Announced DII technology demo at SNW Spring 07 • DII turning over reigns to SNIA • SNIA Data Integrity Task Force (DITF) kicked off in October • Open to new members http://www.snia.org/apps/org/workgroup/data_integrity/

  15. Future Work • Enhanced integrity checking • Operating System (I/O stack) • Passing of protection metadata through stack • Application to Operating System • File I/O extensions for protection metadata • HBA and driver • Validation of protection metadata • Translation of protection metadata

  16. Future Work • Important studies • Data Integrity [Bernd Panzer-Steindel CERN/IT] • Disk replacements [Schroeder, Gibson FAST’07] • Disk replacements & SMART data [Pinheiro et al., FAST’07] • Latent sector errors [Bairavasundaram et al., Sigmetrics’07] • Disk Failures in the Real World [L. Bairavasundaram, G. Goodson, B. Schroeder, A. Arpaci-Dusseau, R. Arpaci-Dusseau, FAST’08]

  17. Questions???

More Related