150 likes | 384 Views
Agenda. What is (Web) data mining? And what does it have to do with privacy? – a simple view – Examples of data mining and "privacy-preserving data mining": Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering)
E N D
Agenda • What is (Web) data mining? And what does it have to do with privacy? – a simple view – • Examples of data mining and "privacy-preserving data mining": • Association-rule mining (& privacy-preserving AR mining) • Collaborative filtering (& privacy-preserving collaborative filtering) • A second look at ...privacy • A second look at ...Web / data mining • The goal: More than modelling and hiding – Towards a comprehensive view of Web mining and privacy. Threats, opportunities and solution approaches. • An outlook: Data mining for privacy
Privacy Problems: Example 1 • Technical background of the problem: • The dataset allows for Web mining (e.g., which search queries lead to which site choices), • it violates k-anonymity (e.g. "Lilburn" a likely k = #inhabitants of Lilburn)
Privacy Problems: Example 2 Where do people live who will buy the Koran soon? • Technical background of the problem: • A mashup of different data sources • Amazon wishlists • Yahoo! People (addresses) • Google Maps • each with insufficient k-anonymity, allows for attribute matching and thereby inferences
Predicting political affiliation from Facebook profile and link data (1): Most Conservative Traits Privacy Problems: Example 3 Lindamood et al. 09 & Heatherly et al. 09
Predicting political affiliation from Facebook profile and link data (2): Most Liberal Traits per Trait Name Lindamood et al. 09 & Heatherly et al. 09
"Privacy-preserving Web mining" example: find patterns, unlink personal data • Volvo S40 website targets people in 20s • Are visitors in their 20s or 40s? • Which demographic groups like/dislike the website? • An example of the "Randomization Approach" to PPDM: R. Agrawal and R. Srikant, "Privacy Preserving Data Mining", SIGMOD 2000.
Randomization Approach Overview 30 | 70K | ... 50 | 40K | ... ... Randomizer Randomizer 65 | 20K | ... 25 | 60K | ... ... Reconstruct distribution of Age Reconstruct distribution of Salary ... Data Mining Algorithms Model
What is collaborative filtering? • "People like what • people like them • like" • – regardless of support and confidence
User-based Collaborative Filtering Idea: People who agreed in the past are likely to agree again To predict a user’s opinion for an item, use the opinion of similar users Similarity between users is decided by looking at their overlap in opinions for other items Next step: build a model of user types "global model" rather than "local patterns" as mining result
1. Privacy as confidentiality:"the right to be let alone" – and to hide data Data Is this all there is to privacy?
2. Privacy as control:informational self-determination • e.g. data privacy: "the right of the individual to decide what information about himself should be communicated to others and under what circumstances" (Westin, 1970) • behind much of data-protection legislation (see Eleni Kosta‘s talk) Data § § Don‘t do THIS !
Discussion item: What is this an example of?Tracing anonymous edits in Wikipedia http://wikiscanner.virgil.gr/