1 / 33

Anonymity and Privacy Issues --- re-identification

Anonymity and Privacy Issues --- re-identification. Yimeng Zhang 12/4/07. Index. Views on Privacy of Social Media Overview of Re-identification You are What You Say: Privacy Risks of Public Mentions, Frankowski et al. SIGIR06. Improper Use of Personal Information Online.

hina
Download Presentation

Anonymity and Privacy Issues --- re-identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anonymity and Privacy Issues--- re-identification Yimeng Zhang 12/4/07

  2. Index • Views on Privacy of Social Media • Overview of Re-identification • You are What You Say: Privacy Risks of Public Mentions, Frankowski et al. SIGIR06

  3. Improper Use of Personal Information Online

  4. Top Privacy Concerns

  5. Remaining Anonymous

  6. True Information Provide While Registering

  7. Ability to Remain Anonymous

  8. Importance of Controlling Personal Information

  9. Specifying Who Can ViewPersonal Information

  10. Conclusion • Around 40% of people would like to remain anonymous on social media or social networking sites • Most people provide their true personal information while registering • Most people think it is important to have the control of personal information online Re-identification Techniques can identify the users of an anonymous dataset

  11. Privacy Loss through Re-identification • Re-identification: Linkage of datasets with explicit identifiers with datasets without explicit identifiers through common attributes • Datasets without explicit identifiers • Public data which are made anonymous by users • Public data by research groups (after suitable anonymizing) • Public data from government agencies (census) People wish to keep private

  12. Public by Group Insurance Commission of Massachusetts Example of Re-identification Voter register list of Massachusetts purchased with only 20$ 87% of Population in 1990. US are likely to be uniquely identified based on only on Zip, Birth and Sex Sweeney, 2002

  13. The Rebus Form + = Governor’smedical records! From Frankowski, SIGIR06

  14. Example of face identification Without explicit identified profiles With explicit identified profiles Friendster Facebook Identity violation! Face Recognizer Gross and Acquisti, WPES 05

  15. You Are What You Say: Privacy Risks of Public Mentions Dan Frankowski, Dan Cosley, Shilad Sen, Loren Terveen, John Riedl University of Minnesota SIGIR 2006

  16. Main Idea • People can be identified by their preferences and what they talk about • Reviews of books, movies, songs • Mentions on forums or blogs • Friend list on Facebook • Wish or purchase list on Amazon • Method for Re-identification • Datasets are represented in Sparse Relation Spaces • Re-identification can be done by matching two Sparse Relation Spaces

  17. Sparse Relation Space • Relates people to items • Sparse: have few relationships recorded per person • Dataset that can be represented in a Sparse Relation Space is vulnerable

  18. Research Questions • Risks of dataset release • What are the risks to user privacy when releasing a dataset • Altering the dataset • How can dataset owners alter the dataset to preserve user privacy • Self defense • How can users protect their own privacy

  19. Experiment Dataset: MovieLens Dataset1: Movie Ratings Users do not allow to reveal Released for research use “Anonymous Dataset” Dataset2: Movies Reviews Public

  20. Feature of the dataset • Both ratings and mentions follow a power law • Important feature for real world sparse relation space Frankowski, SIGIR 06

  21. Evaluation Measure Mentions Mentions by User t Ratings Re-identify Algorithm Top k ratings users ranked by the likelihood they are user t K-identified: t is in the k users returned by the algorithm K-identification rate: the fraction of k-identified users

  22. Set Intersection Algorithm for Re-identification • Likely list: Users in the rating database who have rated every movie mentions by user t • Problem • Users mention movies but do not rate them

  23. TF-IDF Algorithm • Mentions of a user: vector of the movies the user mentioned • Ratings of a user: vector of the movies the user rated • Likelihood: TF-IDF cosine similarity

  24. Scoring Algorithm • Scoring: • emphasize the mentions of rarely rated movies • de-emphasize the number of ratings a user has Score for one mention/movie of a user: Fraction of users who have not rated mention m Score for a user: Multiplication of scores for all mentions of this user

  25. Scoring Algorithm with Ratings • Suppose we have an magic analyzer which can guess the rating of a movie from the mention • Eg. Using the context of that mention • Algorithms • ExactRating: the analyzer can perfectly determine the rating • FuzzingRaing: the analyzer can guess the rating value within +/-1

  26. Percent of users identified by different algorithms

  27. 1-identification rate

  28. RQ2: Altering the dataset • How can dataset owners alter the dataset they release to preserve user privacy • Data Suppression • Algorithm: Drop rarely rated movies • Not big problem for industry, but harmful for research

  29. Dataset level Suppression Do not work!

  30. RQ3: Self Defence • How can users protect their own privacy • Suppression • Not to mention movies rated rarely • Misdirection • Mention items they have not rated

  31. User Level Suppression Do not work!

  32. Misdirection Works when user mention popular items

  33. Conclusion • Simple data mining algorithms can identify the users who mention in a sparse relation space and think they are anonymous • Use the algorithms: eg. find paper reviewers (Future work of Frankowski) • Privacy risks for users on Social Media sites • Hard to preserve privacies • Don’t reveal your privacies even if it seems to be anonymous

More Related