1 / 17

Beyond k -Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk

This study delves into the complexities of privacy in data sharing, highlighting the risks associated with identity and sensitive information disclosure. A framework based on statistical decision theory is proposed, offering a comprehensive assessment of privacy risks through disclosure rules and loss functions. The research presents an algorithm to minimize privacy risk, emphasizing the importance of risk estimation and sensitivity analysis. The text explores how data can be linked and the implications for privacy, ultimately aiming to enhance privacy protection in both public and private sectors.

Download Presentation

Beyond k -Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond k-Anonimity:A Decision Theoretic Frameworkfor Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino

  2. Introduction • Release of data • Private organizations can benefit from sharing data with others • Public organizations see data as a value for the society • Privacy preservation • Data disclosure can lead to economic damages, threats to national security, etc. • Regulated by law in both private and public sectors

  3. Two Facets of Data Privacy • Identity disclosure • Uncontrolled data release: even presence of identifiers • Anonymous data release: identifiers suppressed, but no control on possible linking with other sources

  4. Linkage of Anonymous Data T1 QUASI-IDENTIFIER T2

  5. Two Facets of Data Privacy (cont.) • Sensitive information disclosure • Once identity disclosure occurs, the loss due to such disclosure depends on how much sensitive are the related data • Data sensitivity is subjective • E.g.: for women the age is in general more sensitive than for men

  6. Our proposal • A framework for assessing privacy risk that takes into accounts both facets of privacy • based on statistical decision theory • Definition and analysis of: disclosure policies modelled by disclosure rules and several privacy risk functions • Estimated risk as an upper-bound of true risk and realted complexity analysis • Algorithm for finding the disclosure rule minimizing the privacy risk

  7. Disclosure rules • A disclosure rule is a function that maps a record to a new record in which some attributes may have been suppressed Zj= The j-th attribute is suppressed otherwise

  8. Loss function • Let    be the side information used by the attacker in the identification attempt • The loss function Measures the loss incurred by disclosing the data (z) due to possible identification based on    • Empirical distribution p associated with records x1…xn

  9. Risk Definition • The risk of the disclosure rule  in the presence of the side information  is the average loss of disclosing x1…xn :

  10. Putting the pieces together so far… • An hypothetical attacker performs an indentification attempt on a disclosed record y=(x) on the basis of a side information , that can be a dictionary • The dictionary is used to link y with some entry present in the dictionary • Example: • y has the form (name, surname,phone#),  is a phone book • if all attributes revealed, it is likely y linked with one entry • If phone# suppressed (or missing) y may or may not be linked to a single entity, depending on the popularity of (name, surname)

  11. Risk formulation • Let’s decompose the loss function into an identification part and into a sensitivity part • Identification part: formalized by the random variable Z otherwise

  12. Risk formulation (cont.) • Sensitivity part: • where higher value indicate higher sensitivity • Therefore the loss is:

  13. Risk formulation (cont.) • Risk:

  14. Disclosure Rule vs. Privacy Risk • Suppose that true is the true attacker’s dictionary which is publicly available and that * is the actual database starting from which data will be published • Under the following assumptions: • true contains more records than* (* <= true ) • The non- in true will be more limited than the non- in * Theorem: If θ* contains records that correspond to x1, . . . ,xn and θ*<=θtrue, then:  R(, θtrue)<= R(, θ*)

  15. Disclosure Rule vs. Privacy Risk (cont.) • The theorem proves that the true risk is bounded by R(, θ*) • Under the hypothesis that the distribution underlying  factorizes into a product form Theorem: The rule that minimizes the risk *=arg min  R(, θ) can be found in O(nNm) computation

  16. K-anonimity • K anonimity is SIMPLY a special case of our framework in whcih: • θtrue=T •  is a costant •  is underspecified • Our framework underlies some questionable hypotheses of k-anonimity!!!

  17. Conclusions • New framework for privacy risk taking into account sensitivity • Risk estimation as an upperbound for the true privacy risk • Efficient algorithm for risk computation • K-anonimity generalization

More Related