650 likes | 844 Views
Lecture 20: Privacy in Online Social Networks. Xiaowei Yang. References: On the Leakage of Personally Identifiable Information Via Online Social Networks by Balachander Krishnamurthy and Craig E. Wills
E N D
Lecture 20: Privacy in Online Social Networks Xiaowei Yang
References: • On the Leakage of Personally Identifiable Information Via Online Social Networks by Balachander Krishnamurthy and Craig E. Wills • Characterizing Privacy in Online Social Networks by Balachander Krishnamurthy and Craig E. Wills
Problem • Online social networks are places for users to share privacy information • Personal identifiable information (PII) • Information that can be used to distinguish or trace an individual’s identity either alone or when combined with other information linkable to an individual • Examples of PII • Photos • Status update • However, this information can be leaked to unintended parties
Today • Measurement studies of the importance of the problem • PII can be leaked to third-party websites that make users browsing history linkable • OSN default privacy settings leak PII
USER PRIVACY CONTROLS • Defaults are dangerous • By default, information in a user’s Facebook profile/content, and comments (as on a user’s “Wall”) are viewable by any other user in the user’s networks • Has it changed? • MySpace uses similar permissive defaults in terms of access to a user’s information—all users have access to all other user’s information.
Do users change their defaults? • A 2005 study found that • only 1.2% of college Facebook users at CMU changed the searchability of their thumbnail profile • 0.06% changed their profile visibility (second row) • 75% of 200 users in the Facebook London regional network have their full profile viewable by other users in the network
Measurement Methodology • MySpace • Generated 5000 random numeric userids in an observed range of valid userids • Retrieved their corresponding user profiles • Bebo • Examined the profiles of users who were members of interest groups within Bebo
Facebook • Join regional networks • Large and Small • Geographic diversity • Linguistic/culture diversity • Used the random network browsing feature of Facebook to crawl users’ profiles • 10 users are displayed • 200 retrievals for each regional network • 1600-1700 users
Results • MySpace • Obtained profile information for 3851 valid userids • 79% (3046) of users retained their default settings • Profile, friends, comments and user content world viewable. • Bebo • 80% of the Bebo users allowed their profile, friends, comments and user content to be viewable.
Observations • Users in smaller networks less concerned in making private information available • Higher privacy value in profile information than list of friends • Wall is the most valuable • 79% of those with a viewable profile allowed their Wall to be viewable to anyone in the network for NY • 83% for Seattle • 95% for the Worcester region.
Information leakage to 3rd party domains • PII is sent to 3rd party domains via HTTP requests • Same PII may be sent to the same 3rd party domains when users browse other websites • Online history traceable
HTTP Background • A cookie is a piece of text stored by a web browser • A cookie is sent as an HTTP header by a web server to a web browser • The web browser sends it back unchanged to the server each time it accesses the server • A cookie makes web browsing stateful • http is a request/response stateless protocol
HTTP background (cont.) • An HTTP request contains • the method to be applied to the resource • Request-URI (the uniform resource identifier to the resource) • The protocol version in use • Example of a Request-URI GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org
HTTP background (cont.) • Referer is a request header field • Specifies to the server the address (URI) of the resource from which the Request-URI was obtained • I.e., who asked for the server URI • Referer allows a server to generate customized contents
Sample of Leakage • Friendid is associated with the doubleclick cookie • Other sites the user browses can be linked to the friendid
Leakage of OSN IDs • z.digg.com is a 3rd party advertisement site
Protection Against PII Leakage • User actions • Providing none in OSNs • Filtering HTTP headers • Referer, Cookie • Disallow cookies • … • Aggregators • Filtering PII • Are they going to do it?
OSNs • Strip PII from HTTP requests • A session specific value for UID • External applications • Similarly, strip PII from HTTP requests
Problem Not Unique to OSNs • Any site you have an account with can do so • Examples • A news site leaks user email addresses to online aggregators • A travel site embeds a user’s first name and default airport in its cookies, and leaks them to any site hiding in its domain
Conclusion • Eric Schmidt “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” • By clicking the links and browsing online, they know a lot more about you than you thought
Discussion • What can be done to improve online user privacy? • Browser isolation • Next lecture: privacy-preserving online advertisements • Law enforcement?
References • Challenges in Measuring Online Advertising Systems by SaikatGuha, Bin Cheng, and Paul Francis • Serving Ads from localhost for Performance, Privacy, and Profit by SaikatGuha, AlexeyReznichenko, Kevin Tang, HamedHaddadi, and Paul Francis
Problem • Online advertising funds many web services • E.g., all the free stuff we get from Google • Ad networks gather much user information • How do they use the user information?
Goals • Determining how well ad networks target users
Methodology • Creating two clients representing two different user types • Measuring the different ads each client sees
Challenges • How to compare ads • How to collect a representative snapshot of ads • Quantifying the differences • Avoiding measurement artifacts
Comparing Ads is challenging • Ads don’t have unique IDs • A & B are semantically the same, but with different text • A & C are different, but with same display URLs
How to define two ads are the same? • Easy but illegal approach: comparing destination URLs • FP: flagged as equal but not • FN: equal but not flagged • Display URL has the lowest FNs Use display URL to define ads equality
Taking a Snapshot • More ads can be displayed on any single page • How to determine all Ads that may be fed to a user? • Reload the page multiple times • But too many reloads may lead to ads churn: old ads expire, new ads show up
Determining the # of reloads • Reloads every 5 seconds • Repeated for 200 queries • Curve becomes linear > 10 reloads • Ads churns • Use 10 reloads as the threshold
Quantifying Change • Metrics • Jaccard index: • Extended Jaccard index (cosine similarity)
Comparing Effectiveness • Views: # of page reloads containing the ad • Value: # of page reloads scaled by the position of the ad • Overlap: Jaccard index
The winner is • Weight: log(views) or log(value)
Avoiding artifacts • Different system parameters may lead to different ads view • Browsers used different DNS servers • Browsers receive different cookies • HTTP proxy
Analysis • Configure two or more instances to differ by one parameter • Comparing results for • Search Ads • Website Ads • Online Social Network Ads
Search Ads • A, B: control w/o cookies • C, D: w/ cookies enabled. Seeded w/ different personae • Google 730 random product-related queries for 5 days • No obvious behavioral targeting in search ads. Why? • Keyword based ads bidding • Location targeting not studied
Websites Ads • Measure 15 websites that show Google ads • A, B: control in NY • C: SF; D: Germany • Location affects web ads
Website Ads • A, B: control • C: browse 3 out of 15 websites • D and E: browse random websites and Google search random websites • Google does not use browsing behavior to pick ads
Online social network ads • Set up three or more Facebook profiles • A, B: control and identical • C: differs from A by one profile parameter