180 likes | 198 Views
Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. Tianhao Wang*, Bolin Ding , Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, Somesh Jha. * Work done at Alibaba. Local Differential Privacy (LDP).
E N D
Answering Multi-Dimensional Analytical Queries under Local Differential Privacy TianhaoWang*, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, SomeshJha * Work done at Alibaba
LocalDifferential Privacy (LDP) • takes reports from all users and outputs estimations for any value indomain LDP frequency oracle: counting how many times appears, e.g., [Wangetal.USENIX’17] NoisyData NoisyData NoisyData • takes input value from domain and outputs is-LDPifffor any and from, and any valid output , Data Data Data Data Data Smaller 𝜀 ->Stronger Privacy Trust boundary Active line of research since [Duchi, Jordan, and Wainwright 2013]
Frequency Oracle (FO):RandomResponse • Surveytechniqueforprivatequestions [1] • Surveypeople: • “DoyouhavediseaseX?” • Eachperson: • Flip a secret coin • Answer truth if head (w.p. ) • Answer randomly if tail(w.p. ): • reply “yes”/“no” w.p. 0.5 Similarly, [1] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 1965.
Frequency Oracle (FO):RandomResponse • To get unbiased estimation of the histogram: • If out of 𝑛 people have the disease, we expect to see: “yes” answers • Solving the above equation: is an unbiased estimation of • Privacy guarantee: • For any 𝒗 and 𝒗′ from “yes” and “no”, , • When the domain is large, one can achieve better utility using, e.g., [2-4] [2] Wang, et al. Locally Differentially Private Protocols for Frequency Estimation. In USENIX Security 2017 [3] Bassily, et al. Practical Locally Private Heavy Hitters. In NIPS 2017 [4] Ding, et al. Collecting Telemetry Data Privately. In NIPS 2017 and PPML@NIPS 2018
Problem Setting: Multi-Dimensional Analytics (MDA) under LDP is-LDPifffor any and from, and any valid output , LDP protects users’ data, which is sensitive and generated on devices LDP Random Response Server gets users’ non-sensitive profile and log data We can answer queries in this (joined) fact table Non-sensitive Attributes
Answering MDA Queries: Key Contributions Non-sensitive Attributes Perturbed Attributes • Challenges • Handle aggregation • Range predicates • Multiple dimensions Contributed by Users Trust Boundary Between Server and the Users
Challenge 1: How to Aggregate • Strawman method • Evaluate row by row: $120+$100=$220 • Bias due to randomization Non-sensitive attributes Perturbed attributes
Challenge 1: How to Aggregate • Strawman method • Evaluate row by row: $120+$100=$220 • Bias due to randomization • Group users by aggregating attributes • Group $100: 2 users satisfy the predicate • Group $120: 1 user satisfies the predicate • Weighed sum of estimates • For the group Purchase = $100, estimate how many users satisfy the predicate • If estimates of groups sizes are unbiased, the weighted sum is unbiased • What if aggregating attributes are sensitive? (randomized rounding!) • Welcome to our VLDB2019 System Demo Non-sensitive attributes Sensitive attributes
Challenge 2: Range Predicates (1-dim) • Solution: Hierarchical intervals • Domain size (=8) • A range predicate is decomposed into intervals • Partition users on the layers • Each user reports the histogram on her/his layer using FO • Baseline: Histogram • Each bar has noise • Bad when query range is large MSE = [5] Hay, et al. Boosting the Accuracy of Differentially Private Histograms Through Consistency. VLDB 2010 in predicate
Challenge 3: -Dimensional Queries • HIO (Hierarchical Interval Optimized) • Product of hierarchies • Partition users into groups • Decompose a -dim range predicate into sub-queries MSE(HIO) = What if is large but is small?
Challenge 3-2: -Dimensional Queries • is large but is small • SC (Split and Conjunction) • Split: each user divides privacy budget by , reporting every 1-dim marginal independently • Conjunction: estimating joint distribution from 1-dim marginals • Decompose a -dim range predicate into sub-queries MSE(SC) = MSE(HIO) = , if
Experiments • Dataset:IPUMS and TRANS dataset • (alsoonAdult and Bankdatasets) • Results of a single run
Experiments: HIO v.s. LDP Marginals Normalized absolute error is plotted Predicate is the conjunction of 3 range constraints BetterAccuracy MG stands for the state-of-the-art LDP marginal-releasing technique [6] Query Range [6] Zhikun Zhang, Tianhao Wang, Ninghui Li, Shibo He, and Jiming Chen. CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy. In CCS 2018.
Experiments: More Dimensions Normalized relative error BetterAccuracy x+y means the query is the conjunction of x point queries and y range queries Data contains 4 categorical attributes and 4numerical attributes SC performs better when x+y is smaller
Conclusion • Enabling multi-dimensional analytics (MDA)under LDP • LDP protects users’ sensitive data while the server can utilize other profiles • We can answer MDA queries in the (joined) fact table • Come to our poster for more details and discussion • The solution has been built as a service in a data platform in Alibaba • Advertisement: demo at VLDB’2019 • LDP data sharing/analytics services – a middleware solution • DPSAaS@ : Private Multi-Dimensional Data Sharing and Analytics as Services