1 / 47

Outline

Auditing Compliance with a Hippocratic Database Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center. Outline. Introduction and motivation Problem statement Foundations

ewa
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Auditing Compliance with a Hippocratic DatabaseRakesh AgrawalRoberto BayardoChristos FaloutsosJerry KiernanRalf RantzauRamakrishnan SrikantIntelligent Information Systems ResearchIBM Almaden Research Center

  2. Outline • Introduction and motivation • Problem statement • Foundations • System organization and algorithms • Performance • Summary

  3. Motivation • Hippocratic databases advocate policy directed data management for privacy sensitive data • Need reinforced by legislations and regulations: • Health Insurance Portability & Accountability Act • Gramm-Leach Bliley Act – Consumer Privacy Rule • Goal • Build a system to assist with auditing compliance with the stated policy • Event driven - privacy complaint • Periodic - monitor exposure to privacy violation

  4. Audit Scenario The doctor must now review disclosures of Jane’s information in order to understand the circumstances of the disclosure, and take appropriate action Sometime later, Jane receives promotional literature from a pharmaceutical company, proposing over the counter diabetes tests The doctor uncovers that Jane’s blood sugar level is high and suspects diabetes Jane complains to the department of Health and Human Services saying that she had opted out of the doctor sharing her medical information with pharmaceutical companies for marketing purposes Jane has not been feeling well and decides to consult her doctor

  5. Audit Expression Who has accessed Jane’s disease information? audit T.disease from Customer C, Treatment T where C.cid=T.pcid and C.name = ‘Jane’

  6. Outline • Introduction and motivation • Problem statement • Foundations • System organization and algorithms • Performance • Summary

  7. Problem Statement • Given • A log of queries executed over a database • An audit expression specifying sensitive data • Precisely identify • Those queries that accessed the data specified by the audit expression

  8. “Suspicious” Queries A query Qi has accessed information contained in the Customer table The audit expression A specifies the data to the audited Customer table If query Qi accesses all the cells specified by the audit expression A for any row, Qi is suspicious

  9. Issues • Convenient language • Audit expression (essentially SPJ query) • Fast and precise on audits • Non disruptive • Minimal performance impact on normal database operation • Fine grained

  10. Assumptions • Disclosures stemming from multiple query executions is not considered • No use of outside knowledge to deduce information without detection • Queries considered include • Joins and aggregation, but not nested subqueries • Note that existential subqueries can be converted into joins [SIGMOD92]

  11. Outline • Introduction and motivation • Problem statement • Foundations • System organization and algorithms • Performance • Summary

  12. Informal Definitions • “Candidate” query • Logged query that accesses all columns specified by the audit expression • “Indispensable” tuple (for a query) • A tuple whose omission makes a difference to the result of a query • “Suspicious” query • A candidate query that shares an indispensable tuple with the audit expression

  13. Definition 1 - A virtual tuple v cT is indispensable for an SPJ query Q if the result of Q changes when we delete v: Indispensable Tuple Output columns in Q Duplicate preserving projection operator Predicates in Q Tables common to Q and A The SPJ query Q and the audit expression A are of the form: Columns appearing anywhere in Q

  14. “Candidate” Query Definition 6 - Q is a candidate query with respect to A if: Only candidate queries can be suspicous queries

  15. Definition 5 - Maximal virtual tuple (MVT): A tuple v is a MVT for queries Q1 and Q2 if it belongs to the cross product of common tables in their from clauses Definition 7 - Q is suspicious with respect to A if they share an indispensable MVT v “Suspicious” Query For example, Query Q: Addresses of people with diabetes Audit A: Jane’s diagnosis Jane’s tuple is indispensable for both; hence query Q is “suspicious” with respect to A

  16. Outline • Introduction and motivation • Problem statement • Foundations • System organization and algorithms • Performance • Summary

  17. System Overview Query with purpose, recipient Audit expression IDs of log queries having accessed data specified by the audit query Updates, inserts, delete Generate audit query Database Layer Audit Static analysis Database triggers track updates to base tables Database Layer Data Tables Backlog Audit query Query Log

  18. Static Analysis Query Log Audit expression Eliminates queries that could not possibly have violated the audit expression Insures that Accomplished by examining only the queries themselves (i.e., without running the queries) Filter Queries Candidate queries

  19. Audit Query Generation • Goal • Build a query which, when run, returns the id’s of suspicious queries with respect to an audit expression A

  20. Union T1 T2 Generating the Audit Query Audit Expression Replace each table with it’s backlog to restore the version of the table to the time of each query Combine the audit expression with individual candidate queries to identify suspicious queries Combine individual candidate queries and the audit expression into a single query graph Candidate Query 1 Candidate Query 2 Lines represent input/output relationships between operators QGM is a graphical representation of a query Boxes represent operators, such as select Boxes with no inputs are tables

  21. Suspicious SPJ Query The candidate SPJ query Q and the audit expression A are of the form: Proof of correctness is based upon Definition 7 (suspicious query) and given in the paper QGM rewrites, shown in previous slide, transform Q and A into: Theorem 2 - A candidate SPJ query Q is suspicious with respect to an audit expression A if and only if:

  22. Suspicious Aggregate Query (Including Having) • Solution in the paper

  23. Example Jane’s audit

  24. Audit Expression Who has accessed Jane’s disease information? audit T.disease from Customer C, Treatment T where C.cid=T.pcid and C.name = ‘Jane’

  25. Query Log Query 1 was executed at time T3

  26. Backlog Table (Time Stamp) Operation on a tuple among Insert, Update and Delete Jane’s record was inserted at time T2 and updated at time T4. The backlog table records both versions of her information Timestamp of the operation C. S. Jensen, L. Mark, and N. Roussopoulos [TKDE 1991] Attributes also in the source table Attributes only in the backlog table

  27. Merge Logged Queries and Audit Expression Merge logged queries and audit expression into a single query graph T.s C.n, C.a, C.z Select := T.s=‘diabetes’ and T.p=C.c audit expression := T.p=C.c and C.n= ‘Jane’ C T T C p, r, …, t c, n, …, t Treatment Customer

  28. Transform Query Graph into an Audit Query ‘Q1’ The audit expression now ranges over the logged query. If the logged query is suspicious, the audit query will output the id of the logged query audit expression := X.n= ‘Jane’ X C.n Select := T.s=‘diabetes’ and C.c=T.p C T View of Customer (Treatment) is a temporal view at the time of the query was executed p, r, ..., t c, n, …, t Treatment Customer

  29. Scenario Outcome • The audit uncovers that Query 1 in the query log accessed Jane’s information

  30. Outline • Introduction and motivation • Problem statement • Foundations • System organization and algorithms • Performance • Summary

  31. Empirical Evaluation: Goals • Cost of maintaining backlog tables • Understand the impact of maintaining backlog tables on ongoing database operations • Cost of running audits • Understand whether audits can run in reasonable time

  32. Experimental Setup • IBM M Pro 6868 Intellistation • 800 MHz Pentium III processor • 512 MB of memory • 16.9 GB disk drive • Windows 2000 Version 5, SP 4 • DB2 v7 with default settings • TPC-H database • Supplier table • 100,000 tuples

  33. System Structures • Indexing • Eager indexing • Maintain an index over the backlog table • Maintained during ongoing database operations • Lazy indexing • No index over the backlog table • Create indices at the time of audit • Choice of index • Simple index • Primary key of source table • Composite index • Primary key of source table • Time stamp

  34. Impact on Ongoing Operations • Queries • Additionally log the query string • Already performed in many application environments • Updates • For each updated tuple, • Insert a tuple to the backlog table • Inserts and deletes are handled similarly • In a majority of environments, queries are much more frequent than updates

  35. Update Performance • 100,000 tuples in Supplier table • Update statement updates all tuples • Each update statement fires triggers which inserts an additional 100,000 tuples in backlog • Evaluate impact of multiple versions on performance

  36. Overhead on Updates Simple wins over Composite Number of version of each tuple in the Supplier backlog table 7x if all tuples are updates 3x if a single tuple is updated Eager indexing doesn’t add much cost

  37. Audit Query Performance Audit query: select ‘Q’ from Supplier where skey = k Experiment: Evaluate the impact of the number of versions of tuples in the backlog table on performance

  38. Audit Query Execution Time Composite wins over simple if initial version is selected Simple wins over composite if the current version is selected

  39. Takeaways • The composite index • Enhances the performance of audits, but • Additionally burdens updates when using eager indexing • The system supports • Efficient auditing • Without substantially burdening normal query processing

  40. Related Work • Oracle Privacy Security Auditing • Facility for logging queries with timestamp • Flash-back queries • Restores the version of the data at the time of the query • No support for automated auditing • User manually selects queries from the log and runs them • The user to decide if the query is suspicious • G. Miklau D. Suciu [SIGMOD 2004] • Formal analysis of information disclosure in data exchange • Is information about a secret query S revealed by views V1,…,Vn • Considers all possible instances of a database schema • Assumes tuple independence • We’re interested in given instances (temporal versions) • Nonetheless, it will be interesting to explore the connection between the two works • Active enforcement of policies by limiting disclosure [VLDB’04] • Literature on multi-query optimization

  41. Summary • In light of new privacy legislation • The problem of auditing usage of information represents an important opportunity for database research • Formalized the problem through the fundamental concepts of indispensable tuple and suspicious queries • Achieved our design goals:

  42. Design Goals • Convenient language • Fast and precise on audits • Non disruptive • Minimal performance impact on normal database operation • Fine grained

  43. Backup

  44. Multiple Candidate Queries Union ‘Q1’ ‘Q2’ audit expression := C.n= ‘Jane’ audit expression := C.n= ‘Jane’

  45. ‘Q1’ select:= q1.c1=q2.c1 and … and q1.ci=q2.ci q1 q1 c1, …, ck c1, …, ck audit expression := … audit expression := … Aggregate Queries with Having The join on aggregate columns ensures that the group being tracked by the audit has not been eliminated by the having clause c1, …, ci Qh select:= … Qg c1, …, ci, agg1, …, aggn group:= c1, …, ci Qs

  46. Dynamic Temporal Views View of Customer table at time t c = id n = name a = address h = phone z = zip o = contact t = marketing ts = ts op = opr C1 Time stamp of the logged query c, n, a, h, z, o, t Select :=ts <= t andop <> ‘delete’ andnot(C5) C5 * Exists :=C4.ts <= t andC3.c = C4.c andC4.ts > C3.ts C3 C4 c, n, a, h, z, o, t, ts, op Customer_backlog

  47. Cost of Building Indices over Backlog Tables

More Related