190 likes | 208 Views
This paper presents a novel approach to quantifying the leakage of private information on social network sites using entropy-based privacy metrics. By analyzing blog sentences, the system assesses the amount of potential knowledge intruders can obtain about individuals. The methodology combines NLP techniques with information retrieval models to identify personal attributes such as age, location, and occupation from user-generated content. Experimental results demonstrate the effectiveness of the proposed metric in evaluating privacy risks associated with online information sharing.
E N D
New Approach to Quantification of Privacy on Social Network Sites IEEE AINA 2010 Tran Hong Ngoc Isao Echizen Kamiyama Komei Hiroshi Yoshiura VNU, Vietnam NII, Japan UEC, Japan UEC, Japan Presenter: Yu-Song Syu
Social Network Sites • Growth of SNSs • Leads to an explosion in online information-sharing • With SNSs • People share information with friends • Information include sensitive data • Location, age, career, …
Intruders in SNSs • By making statistics, Intruders may achieve personal information: • Commercial purpose • Identity theft • Physical harm • … • How to get such information?
Usually, people do not know How Much private information they reveal about themselves and others http://www.iis.sinica.edu.tw
Privacy Metric • Based on probability and entropy • Helps user know how much private information may leak from their blog sentences • Defines the Leaked Privacy Value, Δ, as the amount of knowledge that intruders can learn about a “problem of interest”
Proposed System Model Info. Retrieval techniques based on NLP methods Quantification of Privacy
System Model Find the information about someone Prefecture, age, city, university, … Blog sentences that users post
Event Event & Blog Set BlogSetj BlogSeti • Event: • Blog Set: • Intersection:
Blog Set / Joint Blog Set Assumed to never be empty
Math Backgrounds Before Proposed Metric… • Entropy (Uncertainty) • Conditional Entropy • Joint Entropy Event Possible Value
Why Use Entropy? • Idea: Difference of Uncertainty Leaked Privacy
Privacy Leakage Metric • Leaked Privacy Value: • The change in the privacy value that is had by subtracting the privacy after sentences are posted from the privacy before the sentences are posted after before ,& # events
Experiments • Dataset: • Statistical Survey Department, Statistics Bureau, Ministry of Internal Affairs and Communications • Problem of Interest: • Gaining information relating to a victim in an accident, which happened in Japan’s subway and were discussed by SNS users
Experiments - Age (Age) Prefecture Age
Experiments – Total Leaked Privacy • Total Leaked Privacy Before & After Blogging
Conclusions • Proposed a new metric to quantify how much private information is leaked from blog on SNSs • SNS users can see if the posting carelessly expose private information • Based on probability and entropy, the proposal is simpler then others but effective, as proved in experiments