330 likes | 559 Views
An Analysis of P3P Deployment. Hyun Jin Kim Sensitive Information in a Wired World November 11, 2003. Introduction. Privacy Policies US self-regulatory approach to online privacy protection Description of a company’s data practices
E N D
An Analysis of P3P Deployment Hyun Jin Kim Sensitive Information in a Wired World November 11, 2003
Introduction • Privacy Policies • US self-regulatory approach to online privacy protection • Description of a company’s data practices • What information they collect from individuals and what they do with it
P3P Specifications • Developed by World Wide Web Consortium (W3C) over 5 years of work • Became an official W3C “Recommendation” just over a year ago on April 16, 2002
P3P Evaluation System Design • Automated process to measure P3P adoption and gather data from P3P-enabled web sites • By Lorrie Faith Cranor, Simon Byers, and David Kormann (AT&T Labs-Research) • Five major components • URL Collection Mechanism • P3P Policy Retriever • Scripted Interface to the W3C P3P Validator • P3P Policy Evaluator • Generic Data Analysis Tools
URL Collector • To identify sets of sites of interest • Existing lists of URLs • Newly constructed lists that focus on particular web sites • Web spidering technique • Gather information from web directories and other sources
P3P Policy Retriever • Pearl Script to retrieve P3P information • All policies, policy reference files, compact header policies
P3P Validator • W3C P3P Validator • Fetches P3P policy reference files, policy files and compact policies • Checks them for compliance with the P3P 1.0 Specification • Stops validation upon encountering an error • Scripted interface to the W3C P3P Validator • Retrieve P3P policies from sites with errors in their policy reference files
P3P Policy Evaluator • Compares a web site’s policy with a user’s privacy preferences • Finds a mismatch between the P3P policy and the privacy preferences
Data Analysis • Outputs of policy evaluations gathered in a rectangular matrix • Row – policy from a web site • Column – APPEL rule set file • Run a Pearl script over the matrix • Produce various tabulations • i.e., number of sites that returned mismatch between privacy preferences and P3P policies
Web Site Selection • Focus on the sites frequently visited by users • PFF Most Popular • 85 of the 100 busiest sites determined by the October 2001 Nielsen/NetRatings ranking of sites with the most unique visitors per month • Excludes adult sites, children’s sites, business-to-business sites, and sites not in the .com top level domain • PFF Random • Random sample of 302 of the 7821 domains with at least 39,000 unique monthly visitors in October 2001 by Nielsen/NetRatings • PFF Refined Random • 209 domains from the PFF Random list that were in the top 5,625 domains in October 2001 by Nielsen/NetRatings • Excludes adult sites, children’s sites, business-to-business sites, and non-dot-coms • Netscore Top 500 • 500 domains with the most unique visitors during July 2002 by comScore Media Matrix netScore Standard Traffic Measurement report • Key Measures • Top 500 domains with the most unique visitors during July 2002 by comScore Media Matrix Key Measures report • Includes “third-party” sites
Web Site Selection (Cont.) • Alexia • Top 500 domains by Alexia Traffic Ranking on Feb.4, 2003 • Includes non-US domains and adult sites • Froogle • 1,017 sites obtained by crawling the www.froogle.com web sites in April 2003 • Sites offer products for sale • Yahooligans • 900 sites obtained by crawling www.yahooligans.com in April 2003 • Sites for children ages 7-12 • Firstgov • 344 government sites indexed at www.firstgov.gov in April 2003 • Includes US federal and state government sites and sites for some quasi-government organizations • News • 2,429 sites by news.google.com in April 2003 • Includes a variety of news-reporting organizations from the US and other countries
P3P Adoption (Cont.) • P3P adoption increasing over time • Highest for the most popular web sites • Key Measures site lists higher than Netscore • Presence of “third-party” sites • To avoid having their cookies blocked by IE6 • Alexa top 500 list lowest • International nature • Large number of adults sites • One third of the P3P-enabled sites had errors flagged by W3C P3P Validator • 7% had errors that prevented their evaluation by Privacy Bird evaluation engine • Omit required components of a P3P policy • Improperly referencing data elements
Privacy Bird Evaluation • Definition of not sharing data • Sites share data only with agents that use it only to complete the transaction for which it was provided or with delivery companies • Data sharing occurs only under an opt-in policy • 3 standard settings • Low • Trigger a red bird – policy does not match the preferences • Collects health/medical info • Share it with other companies • Use it for analysis, marketing or to make decisions what content or ads the user sees • Engage in marketing but do not provide a way to opt-out
Privacy Bird Evaluation (Cont.) • Medium • Same as low • Sites sharing PII (physical contact info, online contact info, government-issued identifier), financial info, or purchase info with other companies • Sites collecting PII but provide no access provisions • High • Same as medium • Sites sharing any personal info (including non-identified info) with other companies • Use it to determine the user’s habits, interests, or other characteristics • Sites contacting users for marketing • Sites using financial or purchase info for analysis, marketing, or to make decisions that may affect what content or ads the user sees
Privacy Bird Evaluation (Cont.) • Red bird on 24% of the evaluated sites • No opt-out of marketing and/or telemarketing ability offered • Most popular sites receive both green bird on low setting and red bird on high setting • Green bird - Greater awareness of the importance of the “choice” principle • Red bird - Most offer rich ecommerce environments that rely heavily on targeted marketing and profiling visitors • Red birds on Froogle and Yahooligans most likely • Collect health and medical info
Types of Data Collected (Cont.) • Most collected data • Computer info and click stream info • HTTP protocol used for retrieving content from website • Demographic data • Less by Froogle and gov’t web sites • Online contact info, physical contact info, interactive data, unique ids • Mostly by news web sites • Preference info, purchase info, and state management info (cookies) • Fewer collected financial info (excludes purchase process) • Least collected data • Content (email msgs, bulletin board postings, etc.) • Government-issued identifiers • Health information • Political information • Location information (ie. GPS positioning data) • Information not falling into any other pre-defined categories • No government websites collect government-issued identifiers
Data Usage (Cont.) • Almost all websites used data for • Completion and support of the activity for which data was provided • Web site and system administration • Research and development • Majority of sites used data for • Email and postal mail marketing • One-time tailoring of the site content • Two-forms of pseudonymous profiling • Fewer sites used data for • Telemarketing • Profiling in which individuals are identified by name or other PII • Very few sites used data for • Historical preservation (Not by government sites) • Other purposes that do not fall into these categories • News web sites use data for almost every purpose.
Data Recipients and Sharing (Cont.) • Half the websites share PII with parties other than agents who use data for the purpose for which it was provided • Most likely by • News web sites • Froogle list sites with delivery company • Least likely by • Government web sites
Choice Options (Cont.) • Top sites most likely to engage in marketing than less popular sites • Top sites most likely to offer choices (opt-in/out) • Internal choices (telemarketing and other marketing) offered more opt-out than opt-in • Third-party choices offered more opt-in than opt-out
Access Provisions (Cont.) • 92% of sites collecting identified data provides some access provisions • Most provides access to both contact info and other data • Smaller number provides access to only contact info or to all identified data • Very few provides no access • None provides access only to non-contact info
Dispute Resolution Options and Remedies • Individuals can contact customer service to resolve their disputes on most sites • About one-third offered resolution via independent organization (ie. Privacy seal provider) • by most popular sites • Very few indicated resolution of dispute under an applicable law • Almost none indicated resolution in court
Data Retention Policies (Cont.) • Majority did not have a data retention policy for all of the data they collected • Government web sites more likely to have a policy of not retaining info or to have a retention policy based on a legal requirement
Conclusion • P3P adoption is increasing over time, especially for the most popular web sites • Yahooligans (sites for children) most likely to offer opt-in policies • Large number of websites with technical errors in their P3P policies • Debates continue about the need for further privacy legislation and the effectiveness of industry self-regulation in the privacy area. • Essential to have good statistics and privacy policies • US government web sites began posting P3P policies to comply with the privacy requirements of section 208 of the E-Government Act of 2002 • Continue web sweeps of gov’t web sites to monitor compliance with these requirements