530 likes | 801 Views
Hippocratic Databases. Paper Authors: Rakesh Agrawal , Jerry Kiernan, Ramakrishnan Srikant , Yirong Xu Presented By: Camille Gaspard Originally taken from : pages.cpsc.ucalgary.ca /~ hammad /Fall04-00_files/Reg_Hippocratic%20Databases2.ppt. Introduction .
E N D
Hippocratic Databases Paper Authors: RakeshAgrawal, Jerry Kiernan, RamakrishnanSrikant, Yirong Xu Presented By: Camille Gaspard Originally taken from:pages.cpsc.ucalgary.ca/~hammad/Fall04-00_files/Reg_Hippocratic%20Databases2.ppt
Introduction • Privacy with respect to databases • Related work • Founding principles of Hippocratic databases. • Strawman design • Future research • Limiting Disclosure in Hippocratic Databases Hippocratic Oath: “And about whatever I may see or hear in treatment, or even without treatment, in the life of human beings – things that should not ever be blurted out outside – I will remain silent, holding such things to be unutterable.”
Overview • Dramatic and escalating increase in digital data • Lack of privacy awareness, lax security • We need to re-architect database systems to include responsibility for the privacy of data. Automatic collection Digitization Techniques Networks Processors Storage
Vocabulary • Privacy is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others. • Computer security is the prevention of, or protection against, • access to information by unauthorized recipients, and • intentional but unauthorized destruction or alteration of that information
Related Work • Traditional databases • Statistical databases • Secure databases
Traditional Databases • Fundamental to a database system is • Ability to manage persistent data. • Ability to access a large amount of data efficiently. • Universal capabilities of a database system • Support for at least one data model. • Support for certain high-level languages that allow the user to define the structure of data, access data, and manipulate data. • Transaction management, the capability to provide correct, concurrent access to the database by many users at once. • Access control, the ability to deny access to data by unauthorized users and the ability to check the validity of the data. • Resiliency, the ability to recover from system failures without losing data.
Efficiency • Maximizing Concurrency • Resiliency • Privacy • Consented sharing Traditional Databases • Hippocratic databases require all the capabilities provided by current database systems • Different focus • Need to rethink data definition and query languages, query processing, indexing and storage structures, and access control mechanisms
Statistical Databases • Goal: Provide statistical information (sum, count, average, maximum, minimum, pth percentile) without compromising sensitive information about individuals • Query restriction • Restricting the size of query results • Controlling the overlap among successive queries • Keeping audit trails of all answered queries and checking for possible compromises • Data perturbation • Swapping values between records • Replacing the original database by a sample from the same distribution • Adding noise to the values in the database • Adding noise to the results of a query • Hippocratic databases share the goal of preventing disclosure of private information but the class of queries for Hippocratic databases is much broader.
Secure Databases • Sensitive information is transmitted over a secure channel and stored securely • Access controls • Encryption • Multilevel secure databases • Multiple levels of security (top secret, secret, confidential, unclassified) • Eg: Bell-LaPadula model (no read up, no write down)
Privacy Regulations • United States Privacy Act of 1974 requires federal agencies to • permit an individual to determine what records pertaining to him are collected, maintained, used, or disseminated; • permit an individual to prevent records pertaining to him obtained for a particular purpose from being used or made available for another purpose without his consent; • permit an individual to gain access to information pertaining to him in records, and to correct or amend such records; • collect, maintain, use or disseminate any record of personally identifiable information in a manner that assures that such action is for a necessary and lawful purpose, that the information is current and accurate for its intended use, and that adequate safeguards are provided to prevent misuse of such information; • permit exemptions from the requirements with respect to the records provided in this Act only in those cases where there is an important public policy need for such exemption as has been determined by specific statutory authority; and • be subject to civil suit for any damages which occur as a result of willful or intentional action which violates any individual’s right under this Act.
Privacy Regulations • Recent privacy documents • 1996 Health Insurance Portability and Accountability Act (HIPAA) • 1999 Gramm-Leach-Bliley Financial Services Modernization Act • 1995 Canada Personal Information Protection and Electronic Documents Act (PIPEDA)
Guidelines • Collection • Retention • Use • Disclosure • Example: Grad student information at the university
Ten Founding Principles • Purpose Specification. For personal information stored in the database, the purposes for which the information has been collected shall be associated with that information. • Consent. The purposes associated with personal information shall have consent of the donor of the personal information. • Limited Collection. The personal information collected shall be limited to the minimum necessary for accomplishing the specified purposes. • Limited Use. The database shall run only those queries that are consistent with the purposes for which the information has been collected. • Limited Disclosure. The personal information stored in the database shall not be communicated outside the database for purposes other than those for which there is consent from the donor of the information.
Ten Founding Principles • Limited Retention. Personal information shall be retained only as long as necessary for the fulfillment of the purposes for which it has been collected. • Accuracy. Personal information stored in the database shall be accurate and up-to-date. • Safety. Personal information shall be protected by security safeguards against theft and other misappropriations. • Openness. A donor shall be able to access all information about the donor stored in the database. • Compliance. A donor shall be able to verify compliance with the above principles. Similarly, the database shall be able to address a challenge concerning compliance.
Strawman Design • Use scenario: Mississippi Bookstore • Mississippi is an on-line bookseller who needs to obtain certain minimum personal information to complete a purchase transaction. This information includes name, shipping address, and credit card number. Mississippi also needs an email address to notify the customer of the status of the order. • Mississippi uses the purchase history of customers to offer book recommendations on its site. It also publishes information about books popular in the various regions of the country (purchase circles).
The Characters • Name: Super Mario • Privacy pragmatist • Likes the convenience of providing his email and shipping address only once by registering at Mississippi . He also likes recommendations but he does not want his transactions used for purchase circles.
The Characters • Name: Princess Peach • Privacy fundamentalist • Does not want Mississippi to retain any information once her purchase transaction is complete.
The Characters • Name: Luigi • Mississippi employee with questionable ethics • The database and privacy officer must ensure that he is not able to obtain more information that she is supposed to.
Privacy Metadata Privacy Metadata Schema Database Schema Privacy-Policies Table
Privacy Metadata Privacy-Authorizations Table This object is copied from the original paper.
Alternate Organizations • This design may be too restrictive or may not fit in some situations. • May create a new table with the columns • User • Table • Attribute • Purpose • External recipient
Data Collection • Privacy Constraint Validatorchecks whether the business’s privacy policy is acceptable to the user • Example: If Princess Peach required a 2 week retention period, the database would reject the transaction • Data is inserted with the purpose for which it may be used • Data Accuracy Analyzer addresses the Principle of Accuracy. For example, verify that the postal code corresponds to the street address.
Queries • Submitted to the database along with their purpose. Example: recommendations • Before query execution: Attribute Access Control checks privacy-authorizations table for a match on purpose, attribute and user. • During query execution: Record Access Control ensures that only records whose purpose attribute includes the query’s purpose will be visible to the query.
Queries • After query execution: Query Intrusion Detector is run on the query results to spot queries whose access pattern is different from the usual access pattern for queries with that purpose and by that user. • Detector uses the Query Intrusion Model built by analyzing past queries for each purpose and each authorized user. • An audit trail of all queries is maintained for external privacy audits, as well as addressing challenges regarding compliance.
Other Features • Data Retention Manager deletes data items that have outlived their purpose. • Data Collection Analyzer examines the set of queries for each purpose to determine if any information is being collected but not used. This supports the Principle of Limited Collection. • DRA determines if data is being kept for longer than necessary. (Limited Retention) • DCA determines if people have unused (unnecessary) authorizations to issue queries with a given purpose. (Limited Use) • Encryption Support allows some data items to be stored in encrypted form to guard against snooping.
P3P and Hippocratic Databases • Platform for Privacy Preferences (emerging standard developed by the WWW Consortium) • P3P provides a way for a web site to encode its data-collection practices in an XML P3P policy • The sites policy is programmatically compared to a user’s privacy preferences • How to enforce? • Integrate with Hippocratic databases • The Electronic Privacy Information Center (EPIC) has been critical of P3P and believes P3P makes it too difficult for users to protect their privacy
New Challenges – Language • P3P language insufficient • Developed for web shopping language restricted • P3P is a good starting for a language which can be used in a wider variety of environments such as finance, insurance, and health care • Difficult to find balance between expressibility and usability • Work is being done to arrange purposes in a hierarchy rather than the flat space that P3P uses • Other research treats the problem as a coalition game [Kleinberg et al. 2001]where information is valuable and users are compensated for revealing some private information.
New Challenges – Efficiency • Current databases have undergone years of optimization without concern for privacy. What type of performance hit will integrated privacy checking entail? • Some techniques from multilevel secure databases will apply • Technique that can reduce the cost of each check: • Assume number of purposes is 32 or less • Encode the set of purposes by setting a bit in a word • Then the Record Access Control check only requires a bit-wise AND and XOR of two words with a check that the result is 0 • Storage of purpose – space versus efficiency
Challenges – Limited Collection • Access Analysis: Analyze the queries for each purpose and identify attributes that are collected for a given purpose but not used. • Problem: Necessity of one attribute may depend on others • Granularity Analysis: Analyze the queries for each purpose and numeric attribute and determine the granularity at which information is needed. • Minimal Query Generation: Generate the minimal query that is required to solve a given problem.
Challenges – Limited Disclosure • Strawman design as it exists now does not protect against a dynamically determined set of recipients • Example: EquiRate is a credit rating agency that maintains a database of credit ratings • Princess Peach has asked EquiRate to only disclose her credit rating to companies that she has contacted • Luigi has stolen Princess Peach’s identity and has contacted EasyCredit pretending to be her. • EasyCredit contacts EquiRate with Princess Peach’s credentials and obtains her credit rating • Both companies have adhered to the protocol but it has failed
Challenges – Limited Disclosure • Potential solution • Princess Peach digitally signs EasyCredit’s company ID with her cryptographic private key • She provides the result to EasyCredit • EasyCredit contacts EquiRate and gives them Alice’s signed permission • EquiRate verifies the signature and provides the information to EasyCredit
Challenges – Limited Retention • Deleting a record from a database is simple but how does one completely remove the information? • Deleting information for all logs, checkpoints, and backups is difficult
New Challenges – Safety • How do we protect against someone whose access controls prevent them from accessing tables through the DBMS but they have access to the files via other means? • Example: Suppose Luigi has no access to the DBMS but has root privileges on the system • Encryption incurs performance penalties • How can one index or query encrypted columns
New Challenges – Openness • A person challenging information about themselves that was not provided by them (eg: credit report) • Super Mario wishes to find out what databases have information about him • If the database does not have information about him, it should not know who issued the query • Related work: symmetrically private information retrieval
New Challenges – Compliance • Provide each user whose data is accessed with a log of that access along with the query reading the data, without paying a large performance penalty • Fingerprinting – sign up with a company that inserts fingerprint records into the database with an email address, phone number and credit card number along with various approved purposes. If one is used for an unauthorized purpose it would be immediately detected and the company notified that a breach has occurred • Challenge: use the least number of fingerprint records to maximum effect.
Conclusion • Presented a vision, inspired by the Hippocratic Oath, of databases that preserve privacy • Enunciated key privacy principles • Discussed a strawman design for a Hippocratic database • Identified technical challenges • Discussed related work
Limiting Disclosure in Hippocratic Databases Paper Authors: Kristen LeFevrey, RakeshAgrawaly, VukErcegovac, RaghuRamakrishnan, YirongXuy and David DeWitt Presented By: Camille Gaspard
Motivation • It’s essential to manage disclosure in a more fine-grain manner. • Cell-level disclosure schemes have not been well studied before this work.
Contribution • Investigate the alternative semantics for limited disclosure in relational databases and present two models of cell-level limited disclosure: table semantics and query semantics, weighing the relative semantic and performance tradeoffs implicit in the two models. • Rather than viewing the disclosure control problem as one of checking against a list of privileges, they transform it into a query modification problem. • Experimental evaluation.
Table semantics • The table semantics model conceptually defines a view of each data table for each purpose-recipient pair, based on the disclosure constraints specified in the privacy policy. These views combine to produce a coherent relational database for each purpose-recipient pair, independent of any queries, and queries are evaluated against this database.
Query semantics • The query semantics model takes the query into account when enforcing disclosure. • Both models mask prohibited values using SQL's null value.
System Overview (1) • Policy definition: Privacy policies are expressed electronically and stored in the database where they can be used to enforce limited disclosure. • Privacy meta-data: The rules and conditions prescribed by the privacy policy are stored a privacy meta-data tables in the database. • Application context: Each query must be associated with a purpose and recipient. In their system, this information is inferred based on the context of the application issuing the query. The query interceptor infers the purpose and recipient of the query based on context information stored in an additional meta-data table.
System Overview (2) • Query modifier: SQL queries issued to the database are intercepted and augmented to reflect the privacy policy rules regarding the purpose and recipient issuing the query. The results of this new query are returned to the issuer. • Disclosure model: The result of executing a privacy modified query will reflect one of two cell-level limited disclosure models.
Discussion • Limited Disclosure. Can we really limit it? • Limited Retention. Can we really delete the data? • Isn’t it better to keep such complex privacy mechanism as part of the application? • Why not using new techniques in writing stored procedures? • Databases are only part of the whole system. Securing the database helps but not prevent attacks. A holistic solution is what might be needed.