230 likes | 329 Views
Research Data in The Social Sciences: How Much is Being Shared? Amy Pienta Myron Gutmann Jared Lyle ICPSR, University of Michigan. Presentation at the Research Conference on Research Integrity Niagara Falls, NY May 16, 2009. Types of Social Science Data. MAJOR SOCIAL SCIENCE TOPICS
E N D
Research Data in The Social Sciences: How Much is Being Shared?Amy PientaMyron GutmannJared LyleICPSR, University of Michigan Presentation at the Research Conference on Research Integrity Niagara Falls, NY May 16, 2009
Types of Social Science Data MAJOR SOCIAL SCIENCE TOPICS • Social - class, crime, social movements, race relations, culture, folklore, family, aging • Economic - wealth, prosperity, labor, business • Psychological - cognition, attitudes, stereotypes • Politics - justice, democracy, public policy, public administration, international conflict TYPES OF DATA • Surveys, Opinion Polls, Structured Interviews, Experiments, GIS (map) • Administrative & Historical Records • Video, Audio, Transcripts, Text • Web sites, Email, Blogs
How Can We Think About Data Sharing? • Making one’s research data available for others to analyze and/or reanalyze • Placing one’s data in the public domain • Data archive that has a explicit mission to preserve and disseminate data to a wide audience
Value of Data Sharing in the Social Sciences+ • Replication • Surveys are often more comprehensive than any one researcher’s needs/time • Improve other data collections and measurement • Reduces costs by avoiding duplicate data collection efforts • Research training • Data ownership larger than the PI
Many Avenues for Sharing Data in the Social Sciences • Broad-based social science data archives • National data archives (outside the US) • Thematic “boutique” archives • Institutional repositories • Journal-based archives • Individual/departmental websites
Why are data not shared? • Preparing data and documentation can be enormously time consuming • Need to protect the confidentiality of respondents • Fear of getting “scooped” • Lack of rewards for sharing • Limited resources for data preparation
NSF Data Sharing Policy National Science Foundation Important Notice 106 (April 17, 1989) states: "[NSF] expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections, and other supporting materials created or gathered in the course of the research. It also encourages awardees to share software and inventions or otherwise act to make such items or products derived from them widely useful and usable."
NIH Data Sharing Policy The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why data sharing is not possible.
Goals To identify the “universe” of social science data that have been collected To know how much social science data is “at risk” of being lost or has been lost (versus that which is available, preserved) To understand the value of sharing and/or data archiving
LEADS Database at ICPSR • NICHD funding – PI Survey about Disclosure Risks • Library of Congress funding – Identification and Appraisal of “at risk” Social Science Data • ORI RRI funding (NLM) – Creating a research database
What is LEADS? A database of records containing information about thousands of scientific studies that may have produced social science data The database contains: Descriptive information about scientific studies we identify. Information used to determine “fit” and “value” of a scientific study Value-added information from bibliometric analysis, PI surveys, constructed variables
Sources of Information National Science Foundation National Institutes of Health
LEADS Screening Criteria • Social science and/or behavioral science • Original or primary data collection proposed, including assembling a database from existing (archival) sources
NSF Grant Awards in LEADS LEADS contains 17,194 awards made by NSF LEADS spans 30 years of NSF awards - 1976 to 2005
NIH Grant Awards in LEADS • NICHD, NIA, NIMH, NINR, AHRQ, NIAAA, NIDA, Clinical Center, NIDCD, FIC, NCI, NHLBI, NIDDK (1972+) • 172,196 - total # awards screened
LEADS: How Data Are Lost Data Intentionally Discarded “I generally keep data for…10 years beyond the last time I do something with them.” “The material…was considered sensitive data. Institutional review boards.. required us to promise to destroy the data after a certain period of time...” “As I retired…I simply didn’t have the room to store these data sets at my house.”
LEADS: How Data Are Lost Unintentionally Lost “Some data were collected, but the data file was lost in a technical malfunction.” “The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.”
Conclusion & Limitations • Most NIH and NSF funded social science data are not publicly archived • Lower Bound Estimate 3.8% • Upper Bound Estimate 14.2% Limitations • Selectivity Abound (e.g. Harvard Dataverse Catalog; PI Pilot Survey) • Have not taken into account informal data sharing