1 / 28

Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy Preservation of Aggregates in Hidden Databases: Why and How?. Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu. Outline. Introduction Problem Definition Our Approach Privacy Guarantee Experiments Conclusion. Privacy leakage.

Download Presentation

Privacy Preservation of Aggregates in Hidden Databases: Why and How?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu

  2. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  3. Privacy leakage • An airline company’s flight search form lets a user search for a flight by specifying a set of attributes such as departure and destination, date, number of stops, carrier, and cabin preferences.

  4. Privacy of Preservation of Aggregates • Reasons: • Legitimate interfaces give chances to attackers to detect the sensitive aggregates information. • Aggregates information can be used by adversaries to master the whole distribution and other features of the hidden databases behind the interfaces. • To some extent, aggregates information is more useful than individual information. • Challenge: Given a hidden database, develop techniques that make it very difficult to obtain uniform random samples of the database via its search interface without necessitating human intervention.

  5. Privacy of Preservation of Aggregates • Some Assumptions • Data is only accessible through a web-based interface • Consider sampling attacks only • Keep bona fide users unaffected • External knowledge is omitted • Consider Boolean attribute and extend it to categorical or numerical one

  6. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  7. Preliminaries • Terms: • D: database table • m: number of tuples in D • Qs: search query • Sel(Qs): the result set of tuples in D that satisfy Qs • n: number of predicates in Qs • Notification If |Sel(Qs)|>k, only the top-k tuples in Sel(Qs) will be returned according to a ranking function.

  8. Preliminaries (Cont.) • A query Qs is called • Underflow; if |Sel(Qs)|=0 • Overflow; if |Sel(Qs)|>k • Valid; if 0<|Sel(Qs)|≤k • Universal space Ω : the set of all possible search queries • Active space Θ : a subset of Ω containing only those queries that are candidates for issuing at a subsequent time

  9. Problem Definition • (ε,δ)-privacy For a sensitive aggregate query QA: • (ε,δ,p)-privacy • Problem

  10. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  11. Our Approach • Observation To obtain a uniform random sample tuple t, a sampler must have discovered at least one valid research query that contains t in its result. • Main idea In order to thwart sampling attacks, we carefully construct and insert dummy tuples into databases such that most valid and some underflowing queries are converted to overflowing queries.

  12. Single-Sample Attack • Observation |Ω|=3n Pr(picking a valid query)≤m•(2/3)n • Three possible outcomes of Q1: • underflow : the size of Θ shrinks to 3n-1 • overflow : the size of Θ shrinks to 3n-1 • valid: the size of Θ shrinks to 1

  13. Single-Sample Attack and Defense • Three possible outcomes of Qc: • underflow : the size of Θ shrinks to (c+1)3n-c • overflow : the size of Θ shrinks to |Θ|/3c • valid: the size of Θ shrinks to 1 • Key Observation: • Shrinking Θ significantly reduces sampling query cost. • Valid queries as well as long overflowing queries contribute the most to shrinking Θ.

  14. Single-Sample Defense • Techniques: Neighbor Insertion • It is difficult to find long overflowing queries, with Pr ≤ m/2c. • Short valid queries are the most dangerous threat. We insert dummy tuples into the “neighboring zone” of real tuples, such that all valid queries with fewer than b predicates will overflow, b is a parameter.

  15. Multi-Sample Attack and Defense Similarly, we analyze the shrinkage of ΘE and ΘF , and try to minimize it.

  16. Multi-Sample Attack and Defense • Three possible outcomes of Qc: • underflow : up to (c+1)3n-c queries should be removed from both ΘE and ΘF. • overflow : 2c queries removed fromΘE, ΘF can be as small as |ΘE|/3c . • valid: similar to underflow, (c+1)3n-c queries should be removed from both ΘE. • Key Observations • Shrinking ΘE contributes more to the efficiency of sampling than shrinking ΘF. • Short underflowing queries become a major threat to defense.

  17. Multi-Sample Defense • Techniques: High-Level Packing • To convert short underflowing queries to overflowing ones, we add dummy tuples such that all underflowing queries with fewer than d predicates will overflow, d is a parameter. • For example: SELECT * FROM D WHERE a1=1 when k=1, we add <1,0,…,0> and <1,0,…,1>

  18. COUNTER-SAMPLER Algorithm

  19. Extensions • The COUNTER-SAMPLER can be directly applied to both Boolean and categorical databases. • For numerical data, we can use discretization techniques to convert it into categorical data.

  20. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  21. Privacy Guarantee

  22. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  23. Delay of Sampling for Boolean

  24. Delay of Sampling for categorical

  25. Efficiency

  26. Outline • Introduction • Problem Definition • Our Approach • Privacy Guarantee • Experiments • Conclusion

  27. Conclusion • Main contributions • Develop a dummy tuple insertion method to prevent sampling of hidden databases. • Extend it to categorical and numerical databases • Future Directions • Integration of dummy insertion and query auditing • Take external knowledge in to consideration

  28. Thank you! Q&A

More Related