Research Ethics in the 2.0 Era: Conceptual Gaps for Ethicists, Researchers, IRBs

Michael Zimmer, PhD School of Information Studies University of Wisconsin-Milwaukee zimmerm@uwm.edu http://michaelzimmer.org Secretary’s Advisory Committee on Human Research Protections July 21, 2010 Research Ethics in the 2.0 Era:Conceptual Gaps for Ethicists, Researchers, IRBs

My Perspective • Approaching the problem of “The Internet in Human Subjects Research” from the field of information ethics • Focus on how 2.0 tools, environments, and experiences are creating new conceptual gaps in our understanding of: • Privacy • Anonymity vs. Identifiability • Consent • Harm

Illuminating Cases • Tastes, Ties, and Time (T3) Facebook data release • Pete Warden’s harvesting (and proposed release) of public Facebook profiles • Question of consent for using “public” Twitter streams • Library of Congress archiving “public” Twitter streams

T3 Facebook Project • Tastes, Ties, and Time research project sought to understand social network dynamics of large groups of students • Solution: Work with Facebook & an “anonymous” University to harvest the Facebook profiles of an entire cohort of college freshmen • Repeat each year for their 4-year tenure • Co-mingle with other University data (housing, major, etc) • Coded for race, gender, political views, cultural tastes, etc

T3 Data Release • As an NSF-funded project, the dataset was made publicly available • First phase released September 25, 2008 • One year of data (n=1,640) • Prospective users must submit application to gain access to dataset • Detailed codebook available for anyone to access

“Anonymity” of the T3 Dataset “All the data is cleaned so you can’t connect anyone to an identity” • But dataset had unique cases (based on codebook) • If we could identify the source university, individuals could potentially be identified • Took me minimal effort to discern the source was Harvard • The anonymity and privacy of subjects in the study becomes jeopardized

T3 Good-Faith Efforts to Protect Subject Privacy • Only those data that were accessible by default by each RA were collected • Removing/encoding of “identifying” information • Tastes & interests (“cultural footprints”) will only be released after “substantial delay” • To download, must agree to “Terms and Conditions of Use” statement • Reviewed & approved by Harvard’s IRB

1. Only those data that were accessible by default by each RA were collected “We have not accessed any information not otherwise available on Facebook” • False assumption that because the RA could access the profile, it was “publicly available” • RAs were Harvard graduate students, and thus part of the the “Harvard network” on Facebook

2. Removing/encoding of “identifying” information “All identifying information was deleted or encoded immediately after the data were downloaded” • While names, birthdates, and e-mails were removed… • Various other potentially “identifying” information remained • Ethnicity, home country/state, major, etc • AOL/NetFlix cases taught us how nearly any data could be potentially “identifying”

3. Tastes & interests will only be released after “substantial delay” T3 researchers recognize the unique nature of the cultural taste labels: “cultural fingerprints” • Individuals might be uniquely identified by what they list as a favorite book, movie, restaurant, etc. • Steps taken to mitigate this privacy risk: • In initial release, cultural taste labels assigned random numbers • Actual labels to be released after a “substantial delay”, in 2011

3. Tastes & interests will only be released after “substantial delay” • But, is 3 years really a “substantial delay”? • Subjects’ privacy expectations don’t expire after artificially-imposed timeframe • Datasets like these are often used years after their initial release, so the delay is largely irrelevant • T3 researchers also will provide immediate access on a “case-by-case” basis • No details given, but seemingly contradicts any stated concern over protecting subject privacy

4. “Terms and Conditions of Use” statement • I will use the dataset solely for statistical analysis and reporting of aggregated information, and not for investigation of specific individuals…. • I will produce no links…among the data and other datasets that could identify individuals… • I will not knowingly divulge any information that could be used to identify individual participants • I will make no use of the identity of any person or establishment discovered inadvertently.

4. “Terms and Conditions of Use” statement • The language within the TOS clearly acknowledges the privacy implications of the T3 dataset • Might help raise awareness among potential researchers; appease IRB • But “click-wrap” agreements are notoriously ineffective to affect behavior • Unclear how the T3 researchers specifically intend to monitor or enforce compliance • Already been one research paper that likely violates the TOS

5. Reviewed & Approved by IRB • “Our IRB helped quite a bit as well. It is their job to insure that subjects’ rights are respected, and we think we have accomplished this” • “The university in question allowed us to do this and Harvard was on board because we don’t actually talk to students, we just accessed their Facebook information”

5. Reviewed & Approved by IRB • For the IRB, downloading Facebook profile information seemed less invasive than actually talking with subjects • Did IRB know unique, personal, and potentially identifiable information was present in the dataset? • Consent was not needed since the profiles were “freely available” • But RA access to restricted profiles complicates this; did IRB contemplate this? • Is putting information on a social network “consenting” to its use by researchers?

T3 Good-Faith Efforts to Protect Subject Privacy • Only those data that were accessible by default by each RA were collected • Removing/encoding of “identifying” information • Tastes & interests (“cultural footprints”) will only be released after “substantial delay” • To download, must agree to “Terms and Conditions of Use” statement • Reviewed & approved by Harvard’s IRB

Pete Warden Facebook Dataset • Exploited flaw in FB’s architecture to access and harvest public profiles to 215 million users (without needing to login) • Impressive analyses at aggregate levels • Planned to release entire dataset – with names, locations, etc – to academic community • Later destroyed data under threat of lawsuit from Facebook http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profile-data-on-215-million-facebook-users/

Harvesting Public Twitter Streams • Is it ethical for researchers to follow and systematically capture public Twitter streams without first obtaining specific, informed consent by the subjects? • Are tweets publications, or utterances? • Are you reading a text, or recording a discussion? • What are users’ expectations to how their tweets are being found & used? http://michaelzimmer.org/2010/02/12/is-it-ethical-to-harvest-public-twitter-accounts-without-consent/

LOC Archive of Public Tweets • Library of Congress will archive all public tweets • 6 month delay, restricted access to researchers • Open questions: • Can users opt-out from being in permanent archive? • Can users delete tweets from archive? • Will geolocational and other profile data be included? • What about a public tweet that is re-tweeting a private one?

My Perspective • Approaching the problem of “The Internet in Human Subjects Research” from the field of information ethics • Focus on how 2.0 tools, environments, and experiences are creating new conceptual gaps in our understanding of: • Privacy • Anonymity vs. Identifiability • Consent • Harm

Conceptual Gaps • Privacy • Presumption that because subjects make information available on Facebook/Twitter, they don’t have an expectation of privacy • Ignores contextual nature of sharing • Ignores whether users really understand their privacy settings • Anonymity vs. Identifiability • Presumption that stripping names & other obvious identifiers provides anonymity • Ignores how anything can identifiable and become the “missing link” to re-identify an entire dataset

Conceptual Gaps • Consent • Presumption that because something is made visible on Facebook/Twitter the subject is consenting to it being harvested for research • Ignores how research method might allow un-anticipated access to data meant to be restricted • Harm • Researchers imply “already public, what harm could happen” • Ignores dignity & autonomy, let alone unanticipated consequences

Filling the Conceptual Gaps • Privacy • Recognize the strict dichotomy of public/private doesn’t apply in the 2.0 world (if it does anywhere) • Consider Nissenbaum’s theory of “contextual integrity” • Privacy in Context (2009, Stanford University Press) • Should strive to consult privacy scholars on projects & reviews

Filling the Conceptual Gaps • Anonymity & Identifiability • Recognize “personally identifiable information” is an imperfect concept • Consider EU approach of “potentially linkable” to an identity • “Anonymous” datasets are not fully achievable and provides false sense of protection • Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization”

Filling the Conceptual Gaps • Consent • What do we mean by “consent” when it comes to using “publicly” available content • Must recognize that a user making something public online comes with a set of assumptions about who can access and how – that’s what is being consented to (implicitly or explicitly) • …

Filling the Conceptual Gaps • Harm • Must move beyond the traditional US focus of harm as requiring a tangible (financial?) consequence • Protecting from harm is more than protecting from hackers, spammers, identity thieves, etc • Consider dignity/autonomy based theories of harm • Must a “wrong” occur for there to be damage to the subject? • Do subjects deserve control over the use of their data streams?

Now What…. • Researchers and IRBs believe they’re doing the right thing (and usually, they are) • Bring together researchers, IRB members, ethicists & technologists to identify and resolve these conceptual gaps • InternetResearchEthics.org • Digital Media & Learning collaboration • Today’s panel…

Research Ethics in the 2.0 Era:Conceptual Gaps for Ethicists, Researchers, IRBs Michael Zimmer, PhD School of Information Studies University of Wisconsin-Milwaukee zimmerm@uwm.edu http://michaelzimmer.org Secretary’s Advisory Committee on Human Research Protections July 21, 2010

Research Ethics in the 2.0 Era: Conceptual Gaps for Ethicists, Researchers, IRBs