A Privacy – Preserving Index for Range queries

A Privacy – Preserving Index for Range queries Paper By: BijitHore, SharadMehrotra, Gene Tsudik Presented By: AkshayPhadke

What this paper is about • Database as a Service (DAS) • Improving the existing Bucketization Technique • Identification of privacy measures in DAS. • Development of a novel privacy-preserving re-bucketization technique.

DAS and its implications • Database-as-a-service in which organizations outsource data management to a service provider. • Privacy because the data is stored at service provider. • One possible solution: Q = Qsec + Qunsec

Previous Solutions • Bucketization for ranged queries Attribute domain is partitioned into a set indentified by a set. • Deterministic encryption for join queries. Drawbacks: • Lacks in-depth privacy scenarios. • Privacy is subjective: no clear specification.

Before we proceed • Etuple: tuple stored in encrypted form. • crypto-indices: indices created on sensitive attributes. • Bucket_id: Set created is assigned a unique random tag.

Example Allocating a large number of buckets to crypto-indices increases query precision but reduces privacy. On the other hand, a small number of buckets increases privacy but adversely aects performance.

Uniform Query Distribution • Total False Positives: • Average Query Precision: Goal: Minimize the total number of false positives.

Algorithm Basics • Number of false positives depends on the the width of the bucket (i.e. minimum and the maximum values) and the sum of the frequencies. • To solve the problem use Optimal Substructure property: Splitting the problems into two smaller sub problems.

Algorithm

Variance, ASEE and Entropy • Maximize Var(x)

Controlled Diffusion(CDf) • QoS is the maximum allowed performance degradation factor (K). • CDf algorithm increases privacy of buckets. • Diffusion carried out in a controlled manner. • Elements diffused into composite buckets. • d = K..|Bi| / fCB • Composite buckets overlap whereas in case of optimal buckets, they don’t.

Experiments • Data Set - Synthetic Data Set - Real Data Set - Benchmark Query Set • Measurements - Decrease in Precision - Privacy Measure - Performance-Privacy Trade Off - Time taken

Results • Observed decrease in query precision was less than 3 • For privacy measure: standard deviation increases by a large factor. Entropy grows more slowly.

Critique • Although starts promising, the paper becomes a mathematics paper and seems to loose focus of actual intent. • Examples mentioned just have the first step and the final solution, no intermediate steps. • The paper doesn’t explain the results.

Thank you

A Privacy – Preserving Index for Range queries