1 / 50

Exploring Linkability of User Reviews

Exploring Linkability of User Reviews . Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine m almisha,gts@ics.uci.edu. Increasing P opularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010. category. Rating.

reba
Download Presentation

Exploring Linkability of User Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Linkabilityof User Reviews MishariAlmishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,gts@ics.uci.edu

  2. Increasing Popularity of Reviewing Sites • Yelp, more than 39M visitors and 15M reviews in 2010

  3. category Rating

  4. Rising Awareness of Privacy

  5. How Privacy apply to Reviews? • Traceability • Linkability of Ad hoc Reviews • Linkablility of Several Accounts

  6. Contribution • Extensive Study to Measure privacy/linakability in user reviews • Propose models that adequately identify authors

  7. Settings & Problem Formulation

  8. IR: Identified Record AR: Anonymous Record IR AR IR AR IR AR AR IR

  9. TOP-X Linkability Anonymous Record Size (AR) 1, 5, 10, 20,…60 X: 1 and 10 Matching Model Identified Record Size (IR)

  10. Dataset • 1 Million Reviews • 2000 Users • more than 300 review

  11. Methodology • Naïve Bayesian Model • Kullback-Leibler Model • Symmetric Version

  12. Naïve Bayesian (NB) Anonymous Record (AR) Identified Record (IR) Decreasing Sorted List of IRs

  13. Kullback-Leibler Divergence(KLD) Anonymous Record (AR) Identified Record (IR) Increasing Sorted List of IRs

  14. Maximum Likelihood Estimation

  15. Tokens • Unigram: ‘a’, ….’z’ • Digram: ‘aa’, ‘ab’,…,’zz’ • Rating :1,2,3,4,5 • Category: restaurant, Beauty and Spa, Education

  16. Lexical Token Results

  17. NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

  18. KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

  19. NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1

  20. KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1

  21. Improvement (1): Combining Lexical and non-Lexical ones

  22. Combining in NB model Straightforward • P(Rating|IR), P(Category|IR) • But for KLD? • Weighted Average

  23. First, Combine Rating and Category 0.5 Second, Combine non-lexical and lexical 0.997/0.97 for Unigram/Digram

  24. Token Combining Results

  25. Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%

  26. Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%

  27. Rating, Category, and Digram - NB

  28. Rating, Category, and Digram - KLD

  29. What about Restricting Identified Record (IR) Size?

  30. TOP-X Linkability Anonymous Record Size (AR) X: 1 and 10 Matching Model Identified Record Size (IR)

  31. TOP-X Linkability Anonymous Record Size (AR) X: 1 and 10 Matching Model Identified Record Size (IR)

  32. Restricted IR - NB Affected by IR size

  33. Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable

  34. What about Matching All AR’s at once?

  35. TOP-X Linkability Anonymous Record Size (AR) X: 1 and 10 Matching Model Identified Record Size (IR)

  36. Anonymous Records (AR’s) Matching Model Identified Records (IR’s)

  37. Improvement (2): Matching All IR’s At Once

  38. ✖ ✔ ✖ ✖ ✔ ✖ ✖ ✖ ✔

  39. MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%

  40. Matchall - Full Gain, up to 23% Size 20, From 35% To 55%

  41. Improvement (3): For Small IR Size

  42. Changing it to: + Review Length 0.5

  43. Results – Improvement (3) Gain up to 5% Size 10, 89% To 92% Size 7, 79% To 84%

  44. Discussion • Implications • Cross-Referencing • Review Spam • Non-Prolific Users • Gradually becomes prolific • IR of 20, Link Around 70% • Anonymous Record Size • Linkability high even for small (92% for AR of 10) • 60 only 20% of min user contribution

  45. Discussion (cont.) • Unigram Token • Very Comparable for larger AR • Entail less resources in the attach 26 VS 676

  46. Future Directions • Improving more for Small AR’s • Other Probabilistic Models • Using Stylometry • Exploring Linkability in other Preference Databases • More than one AR for different Users: Exploring it more

  47. Conclusion • Extensive Study to Assess Linkability of User Reviews • For large set of users • Using very simple features • Users are very exposed even with simple features and large number of authors

  48. Thank you all!

More Related