Beyond k -Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk

Beyond k-Anonimity:A Decision Theoretic Frameworkfor Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino

Introduction • Release of data • Private organizations can benefit from sharing data with others • Public organizations see data as a value for the society • Privacy preservation • Data disclosure can lead to economic damages, threats to national security, etc. • Regulated by law in both private and public sectors

Two Facets of Data Privacy • Identity disclosure • Uncontrolled data release: even presence of identifiers • Anonymous data release: identifiers suppressed, but no control on possible linking with other sources

Linkage of Anonymous Data T1 QUASI-IDENTIFIER T2

Two Facets of Data Privacy (cont.) • Sensitive information disclosure • Once identity disclosure occurs, the loss due to such disclosure depends on how much sensitive are the related data • Data sensitivity is subjective • E.g.: for women the age is in general more sensitive than for men

Our proposal • A framework for assessing privacy risk that takes into accounts both facets of privacy • based on statistical decision theory • Definition and analysis of: disclosure policies modelled by disclosure rules and several privacy risk functions • Estimated risk as an upper-bound of true risk and realted complexity analysis • Algorithm for finding the disclosure rule minimizing the privacy risk

Disclosure rules • A disclosure rule is a function that maps a record to a new record in which some attributes may have been suppressed Zj= The j-th attribute is suppressed otherwise

Loss function • Let    be the side information used by the attacker in the identification attempt • The loss function Measures the loss incurred by disclosing the data (z) due to possible identification based on    • Empirical distribution p associated with records x1…xn

Risk Definition • The risk of the disclosure rule  in the presence of the side information  is the average loss of disclosing x1…xn :

Putting the pieces together so far… • An hypothetical attacker performs an indentification attempt on a disclosed record y=(x) on the basis of a side information , that can be a dictionary • The dictionary is used to link y with some entry present in the dictionary • Example: • y has the form (name, surname,phone#),  is a phone book • if all attributes revealed, it is likely y linked with one entry • If phone# suppressed (or missing) y may or may not be linked to a single entity, depending on the popularity of (name, surname)

Risk formulation • Let’s decompose the loss function into an identification part and into a sensitivity part • Identification part: formalized by the random variable Z otherwise

Risk formulation (cont.) • Sensitivity part: • where higher value indicate higher sensitivity • Therefore the loss is:

Risk formulation (cont.) • Risk:

Disclosure Rule vs. Privacy Risk • Suppose that true is the true attacker’s dictionary which is publicly available and that * is the actual database starting from which data will be published • Under the following assumptions: • true contains more records than* (* <= true ) • The non- in true will be more limited than the non- in * Theorem: If θ* contains records that correspond to x1, . . . ,xn and θ*<=θtrue, then:  R(, θtrue)<= R(, θ*)

Disclosure Rule vs. Privacy Risk (cont.) • The theorem proves that the true risk is bounded by R(, θ*) • Under the hypothesis that the distribution underlying  factorizes into a product form Theorem: The rule that minimizes the risk *=arg min  R(, θ) can be found in O(nNm) computation

K-anonimity • K anonimity is SIMPLY a special case of our framework in whcih: • θtrue=T •  is a costant •  is underspecified • Our framework underlies some questionable hypotheses of k-anonimity!!!

Conclusions • New framework for privacy risk taking into account sensitivity • Risk estimation as an upperbound for the true privacy risk • Efficient algorithm for risk computation • K-anonimity generalization

Beyond k -Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk

Beyond k -Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk

Presentation Transcript

L-Diversity: Privacy Beyond K-Anonymity

ASSESSING RISK

A Clinical Framework for Assessing Function

LTT: a type-theoretic framework for foundational pluralism

An Information-theoretic Framework for Visualization

Assessing Risk

L-Diversity: Privacy Beyond K-Anonymity

A Decision Framework for the

Towards a Game-Theoretic Framework for Information Retrieval

Group Theoretic Decision Problems

Towards a Game-Theoretic Framework for Information Retrieval

A Framework for Cooperating Decision Procedures

Assessing Risk for Violence

A Unified Framework for Location Privacy

Towards a Game-Theoretic Framework for Information Retrieval

Assessing Risk

Risk Aware Decision Framework for Trusted Mobile Interactions

Assessing Risk for Violence

A decision-theoretic view of image retrieval

Beyond Privacy Policies: Assessing Inherent Privacy Risks of Consumer Health Services

A Game-Theoretic Framework for Analyzing Trust-Inference Protocols

A Unified Framework for Location Privacy