250 likes | 275 Views
Explore privacy concerns & inference control in database applications. Learn how to build accurate models of aggregate data without compromising individual data privacy. Discuss cardinality-based inference control and its application in OLAP systems.
E N D
Cardinality-based Inference Control in OLAP SystemsAn Information Theoretic Approach Nan Zhang Texas A&M University This is a joint work with Dr. Wei Zhao and Dr. Jianer Chen
Privacy Concern • Growing Privacy Concern in Database Applications on the Internet (e.g., Data Mining) • 17% privacy fundamentalists, 56% pragmatic majority, 27% marginally concerned (AT&T Survey) • Challenge: Can we build accurate models of the aggregate data without access to the precise values of individual data?
Answer/ reject Query Q Inference Control Problem Definition • Will the application invade privacy? Application (Data Miner) OLAP Server Randomization Data Providers DataProviders …
Queries Public information Sensitive information Inference Problem
Inference Problem • SU = 20 • S1+S3-SB-ST = 87
Reject queries that may result in an inference problem Answer as many other queries as we can Answer/ reject Query Q Inference Control Goal Application (Data Miner) OLAP Server Database DataWarehouse
Related Work • A lot of work on statistical databases • Survey • Differences • Restriction on OLAP queries • Structure of data cube • Online response time
Related Work • A similar scheme • Our Advantages • Much easier approach • A tighter bound • More general framework
Definition: Query 1-dimensional queries 2-dimensional queries
Definition: Query • There exists a unique cuboid S such that a cell of S is the aggregation of W. • Suppose that S is a k-dimensional cuboid. The dimensionality of Q is defined to be n - k.
Definition: compromisability SU = Sales amount of used books in Feb
2 1 2 5 Definition: compromisability • Compromisability • direct inference • Compromisability <= 1
Cardinality-based Inference Control S3, ST: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 > 2 +S1, SB: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 = 5 +S1, SD: Minimum compromisability = 2, 21*(4+3)-2*22-1 = 5 > 4
Our Approach • A k-dimensional query Q(F, W) can be safely answered if every k+1 dimensional dice X’ in X that • Contains W as a subset • Can be queries as a cell of a (n-k-1)-dimensional cuboid satisfies
x x x x x x x Inference H(x|AQ) = 0 Proof of Our Bound • Basic idea
An Information-Theoretic Definition • Let we have Thus, no inference problem exists in a data cube X if
Main Theorem • Let we have
Final Remarks • Future Work • Quantitative measure of the inference problem • Combination of randomization and inference control approaches
Thank you • Questions