210 likes | 324 Views
Audit Control Environment. Mike Smorul UMD ADAPT Project. ACE Motivation. Many archives use digests to monitor the integrity of their data. Most cannot assert their digests have not been tampered with. Should be lightweight No Public/Private key infrastructure
E N D
Audit Control Environment Mike Smorul UMD ADAPT Project
ACE Motivation • Many archives use digests to monitor the integrity of their data. • Most cannot assert their digests have not been tampered with. • Should be lightweight • No Public/Private key infrastructure • Must be able to be audited by any party • Auditor has no prior relationship with archive or depositor • Audit based only publically available information
ACE Concept • Issue a small token that can be stored alongside an object to be preserved. • The token secures the digest of the object. • The token is cryptographically linked to an external witness value. • Witness value is a single number/digest produced daily. • Easy to secure. • Small amount of data independent of the number of objects (several dozen KB/yr)
ACE Workflow • The digest to secure is submitted an Integrity Management Service(IMS). • The IMS aggregates all the digests into a round • Rounds triggered by time (5s) or critical mass of requests • A Merckle tree is generated of all submitted digests. • Root of the tree is the round summary(RS). • On each digest, a token is created and returned that contain a path from the digest to RS.
Hash Authentication CSI (one hash value) Previous Round Hash Intermediate Hash Value IHV IHV Challenge Hash Hash 1 Hash 2 Hash 3 Hash 4 Hash 5 Hash 6 Create Merkel Tree For Supplied Hashes Link to previous round Gather Hashes During Round Generate proof for hash
Daily Witness Value • Each day a witness value is created • Merckle tree of all RS issued in a day is created • Root of tree is linked to previous days witness to create the daily witness value. • New witness value is widely published. • Small, 365 digests/year. (UMD IMS witnesses <120KB) • IMS need to only store RS values. • UMIACS IMS is <270MB for 2.2m rounds • Witness value ties all hashes to a 24hr window
Integrity Management Service(IMS) • Integrity Management Service • Aggregates requests into rounds • Generates round summaries and corresponding tokens • Publishes a nightly witness value over previous days rounds. • Stores round summaries and generates proof for a round upon request
Audit Manager • Local auditing service, to be installed at an archive • Day-to-day monitoring service for archive staff • Handles both requesting tokens and auditing data. • Future versions will allow token/digest injection. (BagIt, etc) • We’ve found most sites don’t have existing digest lists. • Manage multiple collections on different resources • Current version is a general purpose integrity management tool. • Provides logging of all events encountered on an object • Compare collections to digest manifests or remote sites • Generate periodic reports on collection activity • JSON available output
ACE Audit • Audit Local Files: Audit Manager periodically scans all files and compares stored digests with computed digests. • Assume valid hashes in database • Audit Local Manager: Manager computes round summary for each digest using that digest and its token. This is compared to value stored on the IMS. • Assume IMS returns valid summary information, do not trust hashes in database • IMS Audit: Round summaries are used to compute witness values. These are compared with offsite witness values. • Do not trust IMS, force IMS to prove its CSIs link to a witness
Local Audit • Audit Manager computes digest of an object and retrieves its token. • Using the digest and token, the AM computes the round summary. • Using the round ID stored in the token, the AM asks the IMS for its copy of the round summary • If the round hashes in steps 2 & 3 match the object and the integrity token are intact
IMS Audit • Extract the round ID and the daily witness value for the round you wish to verify. • Request the IMS supply a proof linking the round to the day’s witness value • Compute the a witness value using the round summary and supplied proof • Compare the computed witness to your stored value
External Auditor • The auditor is supplied with the object and token it wishes to validate • Third party obtains the appropriate witness values by subscribing to it, or acquiring from a trusted source. • The auditor performs a local audit, followed by an IMS audit using the computed round summary • External Auditor has never seen the data
What can we prove • Witness to token validation shows • Object is intact if its digest matches the token • IMS and AM have not been compromised • The file’s state can be linked to a 24 hour time window.
Chronopolis Deployment • Three sites • UMD, SDSC, NCAR • Differing hardware (linux/sun/filesystem/SAM QFS) • 20+Tb monitored, 5+ million files • UMD complete audit in a little over a week • Bottleneck was underlying storage system
AM Improvements • Audit Throttling • Limit both the metadata query rate and file transfer rate • Digest Comparison • Compare stored digests to uploaded manifests or partner sites. • JSON • Most data from the AM is available for harvesting • Expanded digest support • Added multiple types of digests to use
Future Directions • Statistical Sampling • Expand the AM to allow for sampling on local resources • Good for offline/tape resources • Tools for verifying a pile of hard drives • Securing BagIT style manifests using tokens. • Cloud data validation • SLA means nothing if you cannot verify • Must be done cheaply
Cloud data validation • Tokens may be stored alongside data in a cloud. • Applications submitted to a cloud may verify data during operation. • App requests data and token • Queries IMS for round summary • Verifies object against proof prior to processing. • Cheap • Amazon, etc charge for transfer outside of cloud • Round summaries are small (several KB)
More Info • http://adapt.umiacs.umd.edu/ace • Downloadable Audit Manager