230 likes | 244 Views
This article explores the use of algebraic signatures in storage applications, such as data integrity checks, file synchronization, and distributed data structures. It discusses the properties and benefits of using algebraic signatures, as well as their implementation and cryptographic security.
E N D
Using Algebraic Signatures in StorageApplications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems Research Center, University of California, Santa Cruz, Retreat June 1,2 2004
Signatures • Small strings that characterize objects. • Calculated from the object. • Distinct Signatures Objects different. • Same Signatures Objects same. • With high probability. • Error probability is 2-f, f length of signatures in bits. • A.k.a. checksums, hashes, fingerprint, condensed representation, …
Signatures • Examples • Tripwire: Protection against malware. • Maintains the signatures of all system libraries in a secure location. • Before a library module is called, verify signature of module.
Signatures • Examples • Remote comparison of files: • Problem arose out of first prototypes of replicated databases. • Divide records into pages. • Calculate and compare signatures for all pages. • Do this efficiently by combining signatures of a set of pages into a super-signature.
Signatures • Integrity check for archival storage • Keep two copies of archived data. • Maintain the signatures of tape contents. • Periodically “scrub” tapes.
Signatures • Similarity Measurement between Files. • Similarity of web-pages. • Similarity of files in Deep-Store.
Signatures • For Scalable Distributed Data Structures • SDDS implement a large file of records in buckets distributed over a network. • SDDS operations (insert, update, delete, read, scan) have execution times independent of SDDS file size. • Use signatures of blocks to decide which portions of the bucket needs to be backed up. • More secure than dirty bit. Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5thWorkshop on Distributed Data and Structures, Thessaloniki, June 2003 (WDAS 2003).
Signatures • For Scalable Distributed Data Structures • Use signatures of records to test whether they have been changed. • Leads to a read-verification based concurrency scheme. • Read the record. • Process the record. • Verify that record has not changed by signature. Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5thWorkshop on Distributed Data and Structures, WDAS’03, Thessaloniki, June 2003. Schwarz, T., Holliday, J.: A Signature Based Concurrency Scheme for Scalable Distributed Data Structures. Workshop on Distributed Data and Structures, WDAS'04, Lausanne, 2004.
Cryptographically Secure Signatures • Computationally impossible to find an object with the same signature. • Protects against malicious attacks. • Used to protect data integrity or to sign data: Object Object Apply Private Key signature K(signature) K(signature) Store encrypted signature with object
Cryptographically Secure Signatures • MD5 • 1995 Rivest • 16B • SHA1 • 1994 NSA: FIPS 180 /ANSI x9.30 • 20B • … • Implement a one-way hash.
Signatures with Algebraic Properties • Composable signatures*: • Capable of calculating object signatures from component objects. • Updatable signatures: • Calculate new signature of a changed object from old signature and the signature and location of change. * Suel, T., Noel, P., and Trendafilov, D.: Improved File Synchronization for Maintaining Large Replicated Collections over Slow Networks. In Proc. 20th Int. Conf. on Data Engineering, ICDE, Boston, 2004, p. 153-164. Litwin, W., Schwarz, T. Algebraic Signatures for Scalable Distributed Data Structures. Proc. of the 20th International Conference on Data Engineering (ICDE), Boston, 2004, p. 412-423.
Signatures with Algebraic Properties • Algebraic properties prevent cryptographic security. • Fundamental Design Trade-off.
Algebraic Signatures • Karp-Rabin signatures over Galois fields. • A Galois field defines addition, multiplication, subtraction, division, etc. over bit strings of length f. • Same mathematical rules as for rational numbers, real numbers, complex numbers, etc. • Single and compound signature of P=(p1,p2, …) Karp, R. and Rabin, M.: Efficient randomized pattern-matching algorithms. In IBM Journal of Research and Development, Vol. 31, No. 2, March 1987. Schwarz, T., Bowdidge, R. and Burkhard, W.: Low Cost Comparison of File Copies. In Proc. Intern. Conf. on Distributed Computing Systems, Paris, Fr., 1990, (ICDCS 5 Proceedings), p. 196-202.
Algebraic Signatures • Properties of compound signature: • Size is mf. • Detects for sure any change of up to m symbols. • A symbol is a GF element, i.e. a bit string of length f. • Collision probability is 2-fm
Algebraic Signatures • Algebraic Signatures Properties • Can update signature from simple change: • Discovers changes from a cut-and-paste operation.
Algebraic Signatures • Algebraic Signatures Properties • Can calculate the signature of a parity object from the signatures of the data objects. • Holds for normal parity (RAID Level 5) • But also for some forms of generalized parity. • Reed-Solomon Codes. • Convolutional Array Codes. Thomas Schwarz, S.J.: Verification of Parity Data in Large Scale Storage Systems, PDPTA 2004, Las Vegas.
Algebraic Signatures in Large Scale Storage Systems • Data – Parity Coherency: • If we miss an update to parity data, then we can no longer reconstruct data: D1 D2 D3 D4 D5 P D1’ D2 D3 D4 D5 P D1’ D2 D3 D4 D5 P ?
Protecting Data in a Large Archival Storage System. • Disk-Based Archival Storage System • Data is cold: • Power down disks between accesses. • Data on disk storage systems is lost because of: • Device Failure. • Block Failure. • Periodically check whether we can access disks. • Periodically check whether we can still read all data on disks.
Protecting Data in a Large Archival Storage System. • Since we need to read all the data anyway, • Since we also need to be concerned about software failures • Check the signatures of data.
Protecting Data in a Large Archival Storage System. • Divide disks into scrubbing blocks. • Assume that the redundancy scheme creates generalized parity blocks for scrubbing blocks. • Maintain a map of the signatures of the scrubbing blocks. D1 P1,4,7 P2,25,31 D25 D9 D2 P9,12,25 D3 D3 D15 D12 D5 P1,8,22 D10 D23 D20 D31 D15 P2,5,13 D22 D17 D19 P15,3,19
Protecting Data in a Large Archival Storage System. • When data in the scrubbing block is updated change its signature. • This happens rarely. • When we scrub, check whether the actual signature of block coincides with the signature in metadata. • If not: Something bad has happened. • Typically software error, but occasionally data corruption. • Comes at almost no costs. • We need to read anyway.
Protecting Data in a Large Archival Storage System. • Periodically check whether parity blocks and data blocks cohere. • Access signatures of data blocks. • Calculate signature of parity block(s). • Compare with actual signature on file.
Protecting Data in a Large Archival Storage System. • Conclusion • Low cost scheme. • Protects against data corruption and parity / data incoherence.