300 likes | 414 Views
Michael Zimmer, PhD School of Information Studies University of Wisconsin-Milwaukee zimmerm@uwm.edu http://michaelzimmer.org Secretary’s Advisory Committee on Human Research Protections July 21, 2010. Research Ethics in the 2.0 Era: Conceptual Gaps for Ethicists, Researchers, IRBs.
E N D
Michael Zimmer, PhD School of Information Studies University of Wisconsin-Milwaukee zimmerm@uwm.edu http://michaelzimmer.org Secretary’s Advisory Committee on Human Research Protections July 21, 2010 Research Ethics in the 2.0 Era:Conceptual Gaps for Ethicists, Researchers, IRBs
My Perspective • Approaching the problem of “The Internet in Human Subjects Research” from the field of information ethics • Focus on how 2.0 tools, environments, and experiences are creating new conceptual gaps in our understanding of: • Privacy • Anonymity vs. Identifiability • Consent • Harm
Illuminating Cases • Tastes, Ties, and Time (T3) Facebook data release • Pete Warden’s harvesting (and proposed release) of public Facebook profiles • Question of consent for using “public” Twitter streams • Library of Congress archiving “public” Twitter streams
T3 Facebook Project • Tastes, Ties, and Time research project sought to understand social network dynamics of large groups of students • Solution: Work with Facebook & an “anonymous” University to harvest the Facebook profiles of an entire cohort of college freshmen • Repeat each year for their 4-year tenure • Co-mingle with other University data (housing, major, etc) • Coded for race, gender, political views, cultural tastes, etc
T3 Data Release • As an NSF-funded project, the dataset was made publicly available • First phase released September 25, 2008 • One year of data (n=1,640) • Prospective users must submit application to gain access to dataset • Detailed codebook available for anyone to access
“Anonymity” of the T3 Dataset “All the data is cleaned so you can’t connect anyone to an identity” • But dataset had unique cases (based on codebook) • If we could identify the source university, individuals could potentially be identified • Took me minimal effort to discern the source was Harvard • The anonymity and privacy of subjects in the study becomes jeopardized
T3 Good-Faith Efforts to Protect Subject Privacy • Only those data that were accessible by default by each RA were collected • Removing/encoding of “identifying” information • Tastes & interests (“cultural footprints”) will only be released after “substantial delay” • To download, must agree to “Terms and Conditions of Use” statement • Reviewed & approved by Harvard’s IRB
1. Only those data that were accessible by default by each RA were collected “We have not accessed any information not otherwise available on Facebook” • False assumption that because the RA could access the profile, it was “publicly available” • RAs were Harvard graduate students, and thus part of the the “Harvard network” on Facebook
2. Removing/encoding of “identifying” information “All identifying information was deleted or encoded immediately after the data were downloaded” • While names, birthdates, and e-mails were removed… • Various other potentially “identifying” information remained • Ethnicity, home country/state, major, etc • AOL/NetFlix cases taught us how nearly any data could be potentially “identifying”
3. Tastes & interests will only be released after “substantial delay” T3 researchers recognize the unique nature of the cultural taste labels: “cultural fingerprints” • Individuals might be uniquely identified by what they list as a favorite book, movie, restaurant, etc. • Steps taken to mitigate this privacy risk: • In initial release, cultural taste labels assigned random numbers • Actual labels to be released after a “substantial delay”, in 2011
3. Tastes & interests will only be released after “substantial delay” • But, is 3 years really a “substantial delay”? • Subjects’ privacy expectations don’t expire after artificially-imposed timeframe • Datasets like these are often used years after their initial release, so the delay is largely irrelevant • T3 researchers also will provide immediate access on a “case-by-case” basis • No details given, but seemingly contradicts any stated concern over protecting subject privacy
4. “Terms and Conditions of Use” statement • I will use the dataset solely for statistical analysis and reporting of aggregated information, and not for investigation of specific individuals…. • I will produce no links…among the data and other datasets that could identify individuals… • I will not knowingly divulge any information that could be used to identify individual participants • I will make no use of the identity of any person or establishment discovered inadvertently.
4. “Terms and Conditions of Use” statement • The language within the TOS clearly acknowledges the privacy implications of the T3 dataset • Might help raise awareness among potential researchers; appease IRB • But “click-wrap” agreements are notoriously ineffective to affect behavior • Unclear how the T3 researchers specifically intend to monitor or enforce compliance • Already been one research paper that likely violates the TOS
5. Reviewed & Approved by IRB • “Our IRB helped quite a bit as well. It is their job to insure that subjects’ rights are respected, and we think we have accomplished this” • “The university in question allowed us to do this and Harvard was on board because we don’t actually talk to students, we just accessed their Facebook information”
5. Reviewed & Approved by IRB • For the IRB, downloading Facebook profile information seemed less invasive than actually talking with subjects • Did IRB know unique, personal, and potentially identifiable information was present in the dataset? • Consent was not needed since the profiles were “freely available” • But RA access to restricted profiles complicates this; did IRB contemplate this? • Is putting information on a social network “consenting” to its use by researchers?
T3 Good-Faith Efforts to Protect Subject Privacy • Only those data that were accessible by default by each RA were collected • Removing/encoding of “identifying” information • Tastes & interests (“cultural footprints”) will only be released after “substantial delay” • To download, must agree to “Terms and Conditions of Use” statement • Reviewed & approved by Harvard’s IRB
Illuminating Cases • Tastes, Ties, and Time (T3) Facebook data release • Pete Warden’s harvesting (and proposed release) of public Facebook profiles • Question of consent for using “public” Twitter streams • Library of Congress archiving “public” Twitter streams
Pete Warden Facebook Dataset • Exploited flaw in FB’s architecture to access and harvest public profiles to 215 million users (without needing to login) • Impressive analyses at aggregate levels • Planned to release entire dataset – with names, locations, etc – to academic community • Later destroyed data under threat of lawsuit from Facebook http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profile-data-on-215-million-facebook-users/
Harvesting Public Twitter Streams • Is it ethical for researchers to follow and systematically capture public Twitter streams without first obtaining specific, informed consent by the subjects? • Are tweets publications, or utterances? • Are you reading a text, or recording a discussion? • What are users’ expectations to how their tweets are being found & used? http://michaelzimmer.org/2010/02/12/is-it-ethical-to-harvest-public-twitter-accounts-without-consent/
LOC Archive of Public Tweets • Library of Congress will archive all public tweets • 6 month delay, restricted access to researchers • Open questions: • Can users opt-out from being in permanent archive? • Can users delete tweets from archive? • Will geolocational and other profile data be included? • What about a public tweet that is re-tweeting a private one?
Illuminating Cases • Tastes, Ties, and Time (T3) Facebook data release • Pete Warden’s harvesting (and proposed release) of public Facebook profiles • Question of consent for using “public” Twitter streams • Library of Congress archiving “public” Twitter streams
My Perspective • Approaching the problem of “The Internet in Human Subjects Research” from the field of information ethics • Focus on how 2.0 tools, environments, and experiences are creating new conceptual gaps in our understanding of: • Privacy • Anonymity vs. Identifiability • Consent • Harm
Conceptual Gaps • Privacy • Presumption that because subjects make information available on Facebook/Twitter, they don’t have an expectation of privacy • Ignores contextual nature of sharing • Ignores whether users really understand their privacy settings • Anonymity vs. Identifiability • Presumption that stripping names & other obvious identifiers provides anonymity • Ignores how anything can identifiable and become the “missing link” to re-identify an entire dataset
Conceptual Gaps • Consent • Presumption that because something is made visible on Facebook/Twitter the subject is consenting to it being harvested for research • Ignores how research method might allow un-anticipated access to data meant to be restricted • Harm • Researchers imply “already public, what harm could happen” • Ignores dignity & autonomy, let alone unanticipated consequences
Filling the Conceptual Gaps • Privacy • Recognize the strict dichotomy of public/private doesn’t apply in the 2.0 world (if it does anywhere) • Consider Nissenbaum’s theory of “contextual integrity” • Privacy in Context (2009, Stanford University Press) • Should strive to consult privacy scholars on projects & reviews
Filling the Conceptual Gaps • Anonymity & Identifiability • Recognize “personally identifiable information” is an imperfect concept • Consider EU approach of “potentially linkable” to an identity • “Anonymous” datasets are not fully achievable and provides false sense of protection • Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization”
Filling the Conceptual Gaps • Consent • What do we mean by “consent” when it comes to using “publicly” available content • Must recognize that a user making something public online comes with a set of assumptions about who can access and how – that’s what is being consented to (implicitly or explicitly) • …
Filling the Conceptual Gaps • Harm • Must move beyond the traditional US focus of harm as requiring a tangible (financial?) consequence • Protecting from harm is more than protecting from hackers, spammers, identity thieves, etc • Consider dignity/autonomy based theories of harm • Must a “wrong” occur for there to be damage to the subject? • Do subjects deserve control over the use of their data streams?
Now What…. • Researchers and IRBs believe they’re doing the right thing (and usually, they are) • Bring together researchers, IRB members, ethicists & technologists to identify and resolve these conceptual gaps • InternetResearchEthics.org • Digital Media & Learning collaboration • Today’s panel…
Research Ethics in the 2.0 Era:Conceptual Gaps for Ethicists, Researchers, IRBs Michael Zimmer, PhD School of Information Studies University of Wisconsin-Milwaukee zimmerm@uwm.edu http://michaelzimmer.org Secretary’s Advisory Committee on Human Research Protections July 21, 2010