340 likes | 365 Views
Inference Problem Privacy Preserving Data Mining. Readings and Assignments. Required: Pfleeger : Chapter 7 Interesting reading:
E N D
Readings and Assignments • Required: • Pfleeger: Chapter 7 • Interesting reading: • I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf • Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 CSCE 522 - Farkas
Indirect Information Flow Channels • Covert channels • Inference channels CSCE 522 - Farkas
Communication Channels • Overt Channel: designed into a system and documented in the user's manual • Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design. CSCE 522 - Farkas
Covert Channel • Timing Channel: based on system times • Storage channels: not time related communication • Can be turned into each other CSCE 522 - Farkas
Inference Channels Non-sensitive information Sensitive Information + Meta-data = CSCE 522 - Farkas
Inference Channels • Statistical Database Inferences • General Purpose Database Inferences CSCE 522 - Farkas
Statistical Databases • Goal: provide aggregate information about groups of individuals • E.g., average grade point of students • Security risk: specific information about a particular individual • E.g., grade point of student John Smith • Meta-data: • Working knowledge about the attributes • Supplementary knowledge (not stored in database) CSCE 522 - Farkas
Types of Statistics • Macro-statistics: collections of related statistics presented in 2-dimensional tables • Micro-statistics: Individual data records used for statistics after identifying information is removed CSCE 522 - Farkas
Statistical Compromise • Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) • Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0) CSCE 522 - Farkas
Methods of Attacks and Protection • Small/Large Query Set Attack • C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1 • Find out existence of property • If count(C and D)=1 means I has property D • If count(C and D)=0 means I does not have D OR • Find value of property • Sum(C, D), gives value of D CSCE 522 - Farkas
Small/Large Query Set Attack cont. • Protection from small/large query set attack: query-set-size control • A query q(C) is permitted only if N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database CSCE 522 - Farkas
Tracker attack q(C) is disallowed C=C1 and C2 T=C1 and ~C2 Tracker C C2 C1 q(C)=q(C1) – q(T) CSCE 522 - Farkas
Tracker attack q(C and D) is disallowed C=C1 and C2 T=C1 and ~C2 C Tracker C2 C1 C and D q(C and D)= q(T or C and D) – q(T) D CSCE 522 - Farkas
Query overlap attack Q(John)=q(C1)-q(C2) C1 C2 Kathy Paul John Eve Max Fred Mitch Protection: query-overlap control CSCE 522 - Farkas
Insertion/Deletion Attack • Observing changes overtime • q1=q(C) • insert(i) • q2=q(C) • q(i)=q2-q1 • Protection: insertion/deletion performed as pairs CSCE 522 - Farkas
Statistical Inference Theory • Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman) CSCE 522 - Farkas
Inferences in General-Purpose Databases • Queries based on sensitive data • Inference via database constraints • Inferences via updates CSCE 522 - Farkas
Queries based on sensitive data • Sensitive information is used in selection condition but not returned to the user. • Example: Salary: secret, Name: public NameSalary=$25,000 • Protection: apply query of database views at different security levels CSCE 522 - Farkas
Database Constraints • Integrity constraints • Database dependencies • Key integrity CSCE 522 - Farkas
Integrity Constraints • C=A+B • A=public, C=public, and B=secret • B can be calculated from A and C, i.e., secret information can be calculated from public data CSCE 522 - Farkas
Database Dependencies Metadata: • Functional dependencies • Multi-valued dependencies • Join dependencies • etc. CSCE 522 - Farkas
Functional Dependency • FD: A B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. • Example: FD: Rank Salary Secret information: Name and Salary together • Query1: Name and Rank • Query2: Rank and Salary • Combine answers for query1 and 2 to reveal Name and Salary together CSCE 522 - Farkas
Key integrity • Every tuple in the relation have a unique key • Users at different levels, see different versions of the database • Users might attempt to update data that is not visible for them CSCE 522 - Farkas
Example Secret View Public View CSCE 522 - Farkas
Updates Public User: • Update Black’s address to Orlando • Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: • Overwrite high data – may be incorrect • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas
Updates Secret user: • Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: • Overwrite low data – covert channel • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas
Inference Problem • No general technique is available to solve the problem • Need assurance of protection • Hard to incorporate outside knowledge CSCE 522 - Farkas
The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Web Enabled Data: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity Undesired Inferences
place address fort district basin Base Water source Confidential Correlated Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Base Place base Public Public Water source Water Source
Inference Control Access Control Confidential Public X Misinfo Organizational Data Attacker X Data Integration and Inferences Ontology Web Data
Inference Control Confidential Public Misinfo Organizational Data ACCESS and INFERENCE CONTROL POLICY • Logic-based inference detection • Exact and partial disclosure • Data and metadata protection • Heterogeneous data manipulation • Metadata discovery
Data Mining and Privacy • Statistical inference: • K-anonymity • Correlation • General inference: • Pattern metadata • Biased learning CSCE 522 - Farkas
Next Class • Software security CSCE 522 - Farkas