360 likes | 476 Views
Lecture 21: Privacy and Online Advertising. References. Challenges in Measuring Online Advertising Systems by Saikat Guha , Bin Cheng, and Paul Francis
E N D
References • Challenges in Measuring Online Advertising Systems by SaikatGuha, Bin Cheng, and Paul Francis • Serving Ads from localhost for Performance, Privacy, and Profit by SaikatGuha, AlexeyReznichenko, Kevin Tang, HamedHaddadi, and Paul Francis
Problem • Online advertising funds many web services • E.g., all the free stuff we get from Google • Ad networks gather much user information • How do they use the user information?
Goals • Determining how well ad networks target users
Methodology • Creating two clients representing two different user types • Measuring the different ads each client sees
Challenges • How to compare ads • How to collect a representative snapshot of ads • Quantifying the differences • Avoiding measurement artifacts
Comparing Ads is challenging • Ads don’t have unique IDs • A & B are semantically the same, but with different text • A & C are different, but with same display URLs
How to define two ads are the same? • Easy but illegal approach: comparing destination URLs • FP: flagged as equal but not • FN: equal but not flagged • Display URL has the lowest FNs Use display URL to define ads equality
Taking a Snapshot • More ads can be displayed on any single page • How to determine all Ads that may be fed to a user? • Reload the page multiple times • But too many reloads may lead to ads churn: old ads expire, new ads show up
Determining the # of reloads • Reloads every 5 seconds • Repeated for 200 queries • Curve becomes linear > 10 reloads • Ads churns • Use 10 reloads as the threshold
Quantifying Change • Metrics • Jaccard index: • Extended Jaccard index (cosine similarity)
Comparing Effectiveness • Views: # of page reloads containing the ad • Value: # of page reloads scaled by the position of the ad • Overlap: Jaccard index
The winner is • Weight: log(views) or log(value)
Avoiding artifacts • Different system parameters may lead to different ads view • Browsers used different DNS servers • Browsers receive different cookies • HTTP proxy
Analysis • Configure two or more instances to differ by one parameter • Comparing results for • Search Ads • Website Ads • Online Social Network Ads
Search Ads • A, B: control w/o cookies • C, D: w/ cookies enabled. Seeded w/ different personae • Google 730 random product-related queries for 5 days • No obvious behavioral targeting in search ads. Why? • Keyword based ads bidding • Location targeting not studied
Websites Ads • Measure 15 websites that show Google ads • A, B: control in NY • C: SF; D: Germany • Location affects web ads
Website Ads • A, B: control • C: browse 3 out of 15 websites • D and E: browse random websites and Google search random websites • Google does not use browsing behavior to pick ads
Online social network ads • Set up three or more Facebook profiles • A, B: control and identical • C: differs from A by one profile parameter
Online social network ads • Use all profile parameters to customize ads • Age and gender are two primary factors • Diurnal patterns due to ads churn • Should it increase or decrease? • Education and relationship matter less, except for engaged and non-engaged women
Checking Impact of Sexual Preference • Six profiles with different sexual preferences • Two males interested in females (male control) • Two females interested in males (female control) • One male interested in male • One female interested in female
Other results • Found neutral ads targeted exclusively to gay men • Clicking would reveal to the advertiser a user’s sexual preference • 66 ads shown exclusively to gay men more than 50 times during experiments
Summary • Search ads are largely key-word based so far • Websites ads use location but probably not behavior • Social network ads use all profile attributes to target users
Question: how can we design a privacy-preserving online advertising system?
Goals • Support online advertising • A good revenue source to fund online services • Preserve user privacy
PrivAd • Serving Ads from a localhost client • Actors: user, publisher, advertiser, broker, and dealer
How it works • Advertisers upload ads to broker • User client subscribes to a set of the ads according to the user’s profile to the broker • Message encrypted with Broker’s public key and contains a symmetric private key • The Broker sends filtered ads to the user client • Ads are encrypted with the symmetric key • Dealer anonymizes the client’s message to Broker
Ad View/Click Reporting • When a user clicks an ad, the user client sends a view/click report containing ad ID and publisher ID to the broker via the dealer • Dealer attaches a unique report ID, removes client identity information, maps the ID to the user identity information
Click-fraud defense • Broker provides dealer the record IDs if it suspects click-fraud • The dealer finds the user • The dealer stops relaying ads to user if convinced • Questions not answered: how to detect by broker, and what’s the punishment
Defining User Privacy • Unlinkability • No single player can link the identity of user with any piece of user’s profile • No single player can link together more than some limited number of pieces of personalization information of a given user • The dealer learns User A clicks on some ad • The broker learns someone clicked on ad X • Not robust to dealer/broker collusion
Scaling PrivAd • Ads churn is significant • 2GB/month of compressed ad data
Discussion • What challenges does PrivAd may face in a practical deployment?