1 / 8

Semantic privacy protection using ontologies

Semantic privacy protection using ontologies. Sergio Martínez, Aïda Valls, David Sánchez iTAKA group As part of IF-PAD group. Data privacy protection for unbouded categorical attributes. Values are textual: words or noun phrases. The set of possible values is not fixed a priori.

pascal
Download Presentation

Semantic privacy protection using ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semanticprivacyprotectionusing ontologies Sergio Martínez, Aïda Valls, David Sánchez iTAKA group As part of IF-PAD group

  2. Data privacy protection for unbouded categorical attributes • Values are textual: words or noun phrases. • The set of possible values is not fixed a priori. • Semantic interpretation of the values using ontologies. • Ex. Textual answers to the question: “What has been the main reason to visit Delta del Ebre?” • Diversion, recreation, adventure, sport, scuba diving, swimming, beach, take photos, wildlife observation, birds watching, ...

  3. (1) Hierarchy-based anonymization of categorical data • Existing work: • Based on an adhoc and small hierarchy, built in function on the input data. • Exhaustive generalization methods (too expensive with real ontologies as WordNet). • Values substituted only by more general ones (generalization). • Our method: • Substitution of sensible values with the most semantically similar one, using the WordNet ontology. • Generalizations, siblings or specializations. • Each substitution increases the level of k-anonymity. • The utility of the data, from a semantic point of view, is kept during the anonymization.

  4. Evaluation • Data mining with semantic clustering Anonymized data Clusters Anonymizations Clustering Based on ontologies Original data Based on VGHs Based on Discernability Semantic Clustering Comparison

  5. (2) Record Linkage of categorical data • Existing work: • Semantic approach of categorical data anonymization: • Generalization: values are substituted by more general ones. • Disclosure risk estimation based on direct matching between values (MRL). • Our method: • Linkage of values with the most semantically similar one, using the WordNet ontology (SRL). • Semantic similarity measures studied: • Path length • Wu & Palmer • Super-concept distance

  6. Evaluation: MRL vs SRL • Real data with 975 records, 2 textual attributes • Dataset is anonymizedusing a generalization schema based on VGH. • We have made a comparison of Record Linkage using Semantics or Matching. VGH3

  7. Future work • Combine several ontologies as background knowledge in order to complement knowledge modelled for each of them. • Propose other anonymization methods for textual attributes (noise addition, micro-aggregation, ...) • Team (ITAKA group at URV): • Aïda Valls (aida.valls@urv.cat) • David Sánchez • Sergio Martínez

  8. Publications • Conferences: • IPMU 2010. Dortmund, Germany, June 2010 • Anonymizing Categorical Data with a Recoding Method based on Semantic Similarity. Sergio Martínez, Aida Valls, David Sánchez • MDAI 2010. Perpignan, France, October 2010 • Ontology-based anonymization of categorical values. Sergio Martínez, Aida Valls, David Sánchez • CCIA 2010. L’Espluga Francolí, Spain, October 2010 • The role of ontologies in the anonymization of textual variables. Sergio Martínez, David Sánchez, Aida Valls, Montserrat Batet • Journals: • Special issue of “Information Fusion”, Elsevier. (DOI: 10.1016/j.inffus.2011.03.004) • Privacy protection of textual attributes through a semantic-based masking method. Sergio Martínez, Aida Valls, David Sánchez, Montserrat Batet • International Journal of Innovative Computing, Information and Control. (Submitted) Towards the evaluation of the disclosure risk of masking methods dealing with textual attributes. Sergio Martínez, Aida Valls, David Sánchez

More Related