240 likes | 259 Views
Auditing Compliance with a Hippocratic Database. Javier Salinas Martín. Outline. Introduction System architecture: Logs Audits Audit queries Performance. Introduction. Responsibly managing privacy sensitive data is mandatory Approaches: Physically logging the results of each query
E N D
Auditing Compliance with a Hippocratic Database Javier Salinas Martín
Outline • Introduction • System architecture: • Logs • Audits • Audit queries • Performance
Introduction • Responsibly managing privacy sensitive data is mandatory • Approaches: • Physically logging the results of each query • New system to audit whether the database executed a query in the past that accessed private data
System properties • Non-disruptive • Fast and precise • Fine-grained • Convenient
Logs • Query log: timestamp, user ID • Temporal extensions: for each table T, a backlog table Tb is created • Time stamped • Interval stamped
Time stamped organization • A tuple in Tb has two additional columns: • TS: time of storage • OP: operation {‘insert’, ‘delete’, ‘update’} • Triggers are used to capture updates • Recover state of T at time τ: take a snapshot
Interval stamped organization • Period of time for wich each tuple was alive: • TS: time of storage • TE: end time • Insert trigger adds t to Tb, setting TE to null • Update trigger searches for tuple b such that b.P=t.P and b.TE=null and sets b.TE to the current time and inserts new tuple t • Delete trigger searches for tuple b such that b.P=t.P and b.TE=null and sets b.TE to the current time
Audit expressions • Identical to that of a select query • No disctinct in the select list • “Audit” replaces “Select” • U: cross product of all the base tables in the database • Cells that satisfy the expression are marked in U
Example of audit expression • Audit if the disease information of anybody living in the ZIP code 95120 was diclosed • Cells corresponding to the disease column of those tuples in the Customer x Treatment table that have c.cid=t.pcid and c.zip = 95120 are marked
Some definitions Tuple t, Query Q, Audit A • Indispensable tuple: omitting t makes a difference on Q • Candidate query: Q accesses all columns A specifies in its audit list • Suspicious query: Q and A share an indispensable tuple
Example 1 • Q is a candidate query with respect to A • Q is suspicious with respect to A if there is a customer who lived in the ZIP code 95120 and was treated for diabetes
Example 2 • Q is not suspicious with respect to A • Anyone who looks at the output of the query will not learn that Alice has cancer
Audit query generation • Full audit expression • Two steps: • Static analysis: select candidate queries from the query log • Audit query generation: augment every candidate query with information from the audit expression and combine them into an audit query that unions their output
Static analysis • Select candidate queries • Four steps: • Check whether Q is a candidate query • Check whether timestamp of Q is out of range • Check whether the purpose-recipient pair of Q matches any of the purpose-recipient specified in the otherthan clause of A • Check for contradictions between predicates • Set of candidate queries Q= {Q1,…,Qn}
Audit Query Generation • Augment every Qi with A • Result is another query AQi, defined against the backlog database at time τi • τi is the timestamp of Qi as recorded in the query log • All AQi are combined into one AQ audit query whose output is the union of the output of the individual AQi • AQ is executed against the backlog database
Audit Query Generation example • Example:
Performance • Cost of maintaining backlog tables
Performance • Execution time of an audit query