1 / 51

Trustworthy History and Provenance for Files and Databases

Trustworthy History and Provenance for Files and Databases. Ragib Hasan Ph.D. Defense University of Illinois at Urbana-Champaign August 28, 2009. We live in the age of information. 5000 years of civilization. Hard Disk, 2008. Sumerian clay tablet (2600BC).

navid
Download Presentation

Trustworthy History and Provenance for Files and Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trustworthy History and Provenance for Files and Databases Ragib Hasan Ph.D. Defense University of Illinois at Urbana-Champaign August 28, 2009

  2. We live in the age of information 5000 years of civilization Hard Disk, 2008 Sumerian clay tablet (2600BC) In the past, data was stored in physical objects (paper, papyrus, clay tablets) Currently , 92%-99% of all business data is stored electronically Today, data is created, processed, and stored by many different people and systems

  3. Data History is Important How do we trust data created, processed, and maintained by others? By knowing the history of data, we can decide if we can trust it. But adversaries have high incentives to tamper with data. So, digital history must be made trustworthy. “History is not history unless it is the truth”. Abraham Lincoln, 1856

  4. Many laws require trustworthy and long-term retention of data history • Sarbanes-Oxley Act (financial records) 3 years — forever • Financial officers/CFOs face fines / prison if their financial records are tampered with • Gramm-Leach-Bliley Act(financial records): Organizations musthave policies for long term retention of record history • OSHA(medical records): 30 years • Companies Act (UK) (financial records): 3—6 years

  5. History can be represented in many ways Data History and Provenance Application specific history We first develop generic techniques for protecting data history, and then focus on history preservation in a particular application: databases

  6. Thesis Statement By leveraging cryptographic techniquesand trustworthy storage architectures, we can make data history trustworthy in a low-cost and low-overhead manner.

  7. Now Prelim vs. • Generic scheme for integrity and confidentiality assurances for data history/provenance • Empirical evaluation for file system workloads • [StorageSS07,FAST09] • Proof of correctness of our scheme • Notion of plausible provenance • Co-provenance • [ACM TOS09, USENIX ;login09:, • USENIX09 Poster] Generic solution for data history protection: Securing provenance of files • Architecture for efficient regulatory-compliant data retention in databases • Fast audit schemes • Proof of correctness • [IDAR09, ICDE-submission10] Initial evaluation of different solution approaches [Bookchapter07,SDM07] History Integrity in Databases • Faster, simpler trustworthy vacuuming • Semantics of litigation holds • Trustworthy litigation holds • [To be submitted to EDBT10] Trustworthy Vacuuming and Litigation Holds Initial evaluation of legal requirements and possible solutions

  8. Garçon à la Pipe (Boy with a Pipe) is a painting by Pablo Picasso. It was painted in 1905 when Pablo Picasso was 24 years old, during his Rose Period, soon after he settled in the Montmartre section of Paris, France. The oil on canvas painting depicts a Parisian (a person from Paris) boy holding a pipe in his left hand and wearing a garland or wreath of flowers. Secure Provenance for Files Protecting the integrity of the past history of data

  9. Summary of Contributions (Provenance) • A cross-platform, low-overhead architecture for capturing provenance information at the application layer and guaranteeing its security; [storagess07,FAST09,login09,TOS09] • A proof of correctness for our approach to secure provenance; [TOS09] • An implementation of our approach for file systems [FAST09, login09,USENIX09-poster]; • An experimental evaluation that shows that our approach to provenance collection with security guarantees introduces very low overheads at run time: • 1%–13% for most real-life workloads (when tracking write operations); and • 8%–10% with Postmark and 12%–16% with real-life workloads (when tracking both read and write operations). [FAST09,TOS09] *

  10. History Integrity in Databases

  11. History of a database is the history of the tuples in the database • Transaction-time databasesretain all versions of tuples • Generic scheme for securing data history via provenance chains is not feasible for high-throughput databases Can we design a database system that will allow fast audits of history integrity, while requiring almost no changes to the database kernel, and incurring little performance overhead?

  12. Alice How to tamper with database history Database updated by transactions DB Tables DBMS T L Transactions Transaction log recorded first Transaction Log Adversaries can insert or delete tuples or logs, or launch untamper attacks (first tamper, then untamper right before audits)

  13. Alice Audrey The main threat we consider is Regret, by insider adversaries • Adversaries can be insiders with superuser capabilities • Regret = attempt to tamper with data already committed to the database Bob Tuples Q Data expiration Regret interval Query verification interval Audit Database

  14. Alice Existing solution: Log Consistent DB Architecture (LDA) Database updated with transactions DB Tables DBMS Plugin T L CL Transactions Transaction Log WORM • A special compliance log, containing information about all new tuple insertions, is stored on a WORM server. Also, the tail of the transaction log is stored on WORM. • Auditor verifies whether the old snapshot of DB (stored on WORM) plus the new tuples equal the current DB state

  15. Pros and cons of LDA • Requires clocks in WORM and DB server to be roughly synchronized • Performance overhead is 10-24% • The regret window (i.e., window of vulnerability) is large (5 minutes) • Audit times are large – almost 2 weeks for 1 year’s worth of transactions • Can detect any tampering with the tuples in the DB • Can prevent untamper attacks using the hash-page-on-read technique

  16. Our Transaction Log on WORM (TLOW) architecture uses a WORM server to store Transaction Logs DBMS Engine Transaction Log Tables, Indices WORM Storage Server Database Storage Server

  17. Alice TLOW DBMS must switch log files every r/2 time (i.e., every half regret-interval) DB Tables DBMS L T Transactions WORM L

  18. An audit succeeds if the tuple completeness check is satisfied We use Bellare et al.’s additive hash for H Stored on regular storage Stored on WORM + = Old DB State New tuples Current DB State + = H(Old DB State) H(New tuples) H(Current DB State)

  19. Auditor checks if final state is consistent with snapshot and log Final DB state Snapshot DB State Traditional Storage DB State Transactions … Integrity Check Auditor Transaction Log WORM Storage

  20. The Audit Helper pre-computes the hashes to reduce audit time AH stores the hashes in H-files in the WORM, and switches to a new H-file every r/2 time DBMS Engine Audit Helper (AH) Transaction Log Tables, Indices Audit Hashes WORM Storage Server Database Storage Server

  21. Alice The Audit Helper hashes the log files as they come in DB Tables DBMS T L Transactions WORM L Auditor uses an H-file *only if* its timestamp is within r/2 of the log file AH H

  22. Performance evaluation

  23. TLOW has less than 1-2% overhead on transaction throughput TLOW-AH, with r=30 seconds LDA has an overhead of 14% over TPCC TLOW with r=30,120, 300 seconds, all have overheads 0.2%-1.5%, so these 3 lines almost coincide with TPCC

  24. With 100% hash files valid, audit time for 100k transactions is less than 2 seconds, compared to 100+ seconds without AH Without using H-files, audit for 100K txns took 101 seconds With all H-files valid, auditing 100K txns took less than 1 second

  25. TLOW provides two types of guarantees against untamper attacks Probabilistic: • Since audits are very fast, frequent audits can be done Deterministic: • Using the same hash-page-on-read technique (where db pages are hashed whenever they are read)

  26. Summary of contributions in TLOW • TLOW is a simpler and much more efficientapproach to supporting history integrity, with no changes to the DBMS kernel. • TLOW imposes 1-11% overhead in TPC-C transaction throughput (depending on the exact security guarantees supported), while at the same time, it reduces LDA’s window of vulnerability by a factor of 5. • We prove the correctness of TLOW. The proof illustrates a number of subtle threats, as well as a correctness problem with LDA that we show how to fix. • We introduce the TLOW audit helper (AH), which reduces the cost of audits 100-fold, allowing frequent internal audits, without compromising the performance of their production workload. Ragib Hasan and Marianne Winslett, Efficient Audit-based Compliance for Relational Data Retention, UIUC DCS Tech Report, March 2009, (under submission at ICDE 2010)

  27. Secure Vacuuming and Litigation Holds in Databases

  28. Vacuuming is periodically done to remove expired tuples Tuples expire after their retention period ends, and are removed from the DB and index How do we enhance TLOW with support for vacuuming and litigation holds? Vacuuming process DBMS L This is the only allowed shredding in a transaction time DB Logged as “delete tuple” on TXN Log DB Transaction Log T If a litigation hold exists, a tuple should not be vacuumed even if it has expired

  29. Litigation holds are inevitable part of almost any court case Federal Rules of Civil Procedurerequire • Placement of hold on relevant data during litigation • Held data must be retained “in original form” Non-compliance => negative inference • Zubulakevs. UBS: UBS fined $29.3 million

  30. Modeling Litigation Holds Litigation Hold Query q Name Start time ts End time te DB D Litigation Hold Constraint The View(q) on the Database D must remain invariant in the interval <ts, te> • Legal Interpretation of FRCP • Holds are placed on past data • New tuples entering the DB do not fall under existing holds

  31. What to hold : Tuples or Views? What is “original form” ? View on the base relations Invoice(SSN,Name,Address,Date,Price) Person (SSN, Name,Phone,Address) Order (ID, SSN, Date,Price,Desc.) Base relations Option for Litigation Holds Hold only the (materialized) view Hold all base relations “contributing” to a View Given a query, how to find “contributing” tuples? Use Cui and Widom’s scheme (2000), which provides an algorithm to decompose a SQL query into queries on single base relations Run the queries on base relations to get the “contributing” tuples

  32. Litigation holds can be implemented in several ways Scheme 1. Store the view on WORM Advantage Only the result needs to be stored Vacuuming/Auditing operations do not need to consider holds Disadvantage For large or overlapping views, a lot of space is needed on WORM Auditor Hold Query WORM Materialized View stored on WORM DBMS

  33. Litigation holds can be implemented in several ways Scheme 2. Store contributing tuples on WORM Advantage Vacuuming operation does not need to care about holds Disadvantage For large or overlapping views, a lot of space is needed on WORM Auditor Hold Query to base relations WORM Contributing tuples stored on WORM DBMS

  34. Litigation holds can be implemented in several ways Scheme 3. Maintain a hold counter for each contributing tuple Advantage No space overhead for overlapping holds, No extra data copied to WORM Disadvantage Vacuumer and auditor need to perform extra lookups for each vacuumed tuple Auditor Hold counter: <tupleID, hold count> Vacuumer checks the hold counter table, and skips expired tuples with a nonzero hold counter Auditor first performs a TLOW style audit on the hold counter table to ensure it’s valid Hold Query to base relations tuples Hold counters updated on the hold counter table DBMS DB

  35. Litigation holds can be implemented in several ways Scheme 4. Create a Bloom filter per hold Advantage Fast lookup during vacuuming and audits Disadvantage There is a false positive probability (though can be <0.001% ), which would cause additional checks Auditor Bloom filter is maintained on WORM Vacuumer checks all existing Bloom filters , and skips expired tuples included in at least one of the filters In case of false positives, further check is performed Hold Query to base relations tuples A Bloom filter created for these tuples, stored on WORM DBMS WORM

  36. How to detect improper vacuuming? Auditor finds time Tvac for each vacuuming operation Transaction Log Auditor finds supporting information (e.g., holds valid at Tvac) List of Litigation Holds, Tuple expiration times For each tuple t vacuumed at Tvac Auditor checks compliance with expiration and litigation holds

  37. We can augment TLOW audits to consider vacuuming + = Old DB State New tuples Current DB State — Vacuumed tuples + = H(Old DB State) H(New tuples) H(Current DB State) + — H(Vacuumed tuples)

  38. Hold Placement Times SELECT s_id, s_w_id, quantity FROM STOCK where s_w_id = warehouse

  39. Space Overhead For Overlapping holds, the hold counter scheme requires no extra storage For aggregation queries, holding the view requires the least space (Overlapping)

  40. Summary of Contributions • We augment existing schemes for database integrity audits to support vacuuming • We present a generic scheme for verifying vacuuming events for compliance with expiration and litigation hold policies. • We propose semantics and methods for the placement, audit, and removal of litigation holds on relational data. • We present four schemes for implementing litigation holds • We perform empirical evaluation of the four schemes to show the tradeoffs

  41. Data History and Provenance Application specific history Generic scheme for securing data history and provenance Architecture for efficient collection of secure provenance for files Application layer library for secure provenance Proof of correctness Experimental evaluation [StorageSS07,FAST09,Login09,CIDR09, USENIX-poster, TOS09] Efficient scheme for database history integrity Fast audit scheme Experimental evaluation Schemes for litigation hold support Support for vacuuming [TechReport09, IDAR09, ICDE10(submitted), EDBT10 (to be submitted)]

  42. Questions?

  43. Backup slides start here

  44. Plausible History Plausible History is a sequence of events that could have generated the current state of the document Our guarantee: Auditors will always be able to detect if a given history is plausible With Co-provenance, we can provide even stronger guarantees (i.e., we can say with high probability that among several plausible histories, which one is the real history)

  45. Co-provenance Idea Entangle provenance chains of multiple “related” documents How?Similar to spiral chains, but each provenance entry now has multiple checksums, each checksum connecting it with a different “related” document. Which documents are “related”? Example: Medical records of all patients visiting a certain doctor on a certain date; Birth certificates of all babies born on same day

  46. Verifiable Vacuuming We can generalize auditing a vacuuming event as a policy compliance problem: • I.e., does the vacuuming operation comply with organizational retention policies as well as existing litigation holds? • Has the tuple expired? (Compliance with expiration policy) • Was the tuple under a litigation hold? (Compliance with litigation hold policy) A vacuuming event is valid, if it complies with expiration and litigation hold policies

  47. Alice Audrey Untamper Attack Bob Q Audit Alice commits a tuple to DB Malory tampers with the tuple Bob reads tampered value Malory restores tuple to valid value Audrey cannot find any tampering Back

  48. Alice Audrey Hash-page-on-Read Bob Q Audit Auditor checks if these page hashes match what the page should have contained Bob hashes every page he reads, and stores them on WORM WORM h(P) Back

  49. TLOW Performance Evaluation • Experimental Setup • Berkeley DB 4.6.24 • Shore implementation of TPC-C Benchmark, ported to BDB • 10 Warehouses • 512 MB DB Cache • Pentium 4, 2.6 GHz processor, 4 GB RAM • Experiment modes: • Unmodified TPC-C • TPC-C-TLOW, with r=30, 120, 300 seconds • Without Audit Helper (AH) • With Audit Helper (AH)

  50. Which tuples to hold? • We can decide to hold • only the materialized view, or • all “contributing” tuples • A tuple “Contribute”s to a litigation hold view, if any part of the tuple appears in the litigation hold view • We can take a complex SQL query, and then apply Cui et al.’s transformation to convert it to queries on base relations

More Related