180 likes | 315 Views
Hashing. Then and now Mike Smorul – ADAPT Project. Commodity Storage Performance. 2003 JetStor III IDE-FC 62MB/s large block 2013 218MB/s workstation SSD Perc 6/MD1000, 400MB/s+. Chip Speed. 2003: Pentium 4 3.2Ghz 2013: Core i7 Extreme 3.5Ghz. Hashing Performance.
E N D
Hashing Then and now Mike Smorul – ADAPT Project
Commodity Storage Performance • 2003 JetStor III IDE-FC • 62MB/s large block • 2013 • 218MB/s workstation SSD • Perc 6/MD1000, 400MB/s+
Chip Speed • 2003: Pentium 4 • 3.2Ghz • 2013: Core i7 Extreme • 3.5Ghz
Hashing Performance • SHA-256 Hashing • Java: 85MB/s • Crypto++: 111-134MB/s • Real World Penalty • Java: 20-40% penalty on slow seek disk
Implications • Flipped bottlenecks
How to overcome • Faster/weaker digests • Simultaneous transfers • Data locality, tape? • Improve single stream performance
Parallelize Single Stream • Independent IO and digest threads • Always have work for the digest algorithm. • Large files saw over 95% of algorithm potential. • Small files unchanged.
Where to apply fixity • Internal integrity services • At Transfer via manifests • End to End?
Operational Integrity • Internal Auditing • Prove your hardware • Error, not malice detection • Peer-Auditing • Prove your friends
Transporting Integrity • Manifest Lists • Transfer validation • Digital Signatures • Prove identity • Token Based • Prove time
Chronopolis Integrity • Current: • Producer supplied authoritative manifest • Peers locally monitor integrity • Manually trace back to point of ingest
Chronopolis Integrity • In-progress • Single integrity token back to ingest • Ideal • Tokens issued prior to arrival • ‘Prove’ the state of data to point before Chronopolis
Manifests 2.0 • Beyond simple transfer list • Token manifests • Portable, embeddable • Python, etc
Cloud Integrity • Digests in a cloud validate transfer only • Http headers can pass extended integrity information • End-user verification
Integrity as provenance • Integrity checking forward in time • Consumer level verification of data • Integrity from object creation • Start integrity checking before archiving
Closing • Why are you hashing? • What do you want to prove? • Hashing Cost/performance
Contact Mike Smorul msmorul@sesync.org http://adapt.umiacs.umd.edu/ace