1 / 15

Data Citation for the Social Sciences

This presentation discusses the importance of data citation in social sciences, including the need for data availability and verification, the role of metadata, concerns about confidentiality, and the growing movement to require data behind findings to be publicly available. It also explores the challenges of versioning, granularity, and replication, and the efforts to create durable linkages between data and publications.

Download Presentation

Data Citation for the Social Sciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Citation for the Social Sciences Mary Vardigan ICPSR CODATA Conference on Data Attribution and Citation August 22-23, 2011

  2. Today’s Presentation • Norms in the social sciences and implications for data citation • Summary of major citation issues for social science

  3. Knowledge claims • Social science advances through knowledge claims published in the literature • Need to verify and extend claims; Secondary analysis encouraged • Follows that data need to be available for reuse and cited

  4. Data sharing • Strong tradition of data sharing, both formal and informal • Active social science data archives around the world • Some PIs distribute data on Web sites • Pienta, Alter, and Lyle found 88.5% of data generated not publically archived (since 1985)

  5. Metadata • Metadata play important role – Documentation necessary to understand the data • Questionnaires, user guides, methodology descriptions, record layouts also provided • Heterogeneous in format – most unstructured • Data Documentation Initiative (DDI) seeks to provide a structured metadata standard

  6. Granularity and versioning • “Studies” may be single datasets or aggregations • Also a need to cite data subsets that support the findings in publications • Data are sometimes updated and need to be versioned

  7. Content and formats • Mostly quantitative data and some qualitative • Boundaries blurring between social science and other domains • Survey data supplemented by biomarker data • Survey data merged with administrative records • Trend toward complex collections • Social media data • Video, audio data

  8. Confidentiality concerns • Survey respondents promised anonymity, a critical pledge to uphold • Legal agreements required for restricted data use • New mechanisms to analyze restricted data online emerging – virtual enclaves and virtual datasets • Often a public-use version and restricted versions coexist

  9. Replication • Most claims not able to be replicated based on information in publications • Replication archives -- ICPSR, Dataverse, etc. • What is required is chain of evidence and record of decisions – deep citation and provenance • Need both production transparency (record of decisions in transforming data) and analytic transparency (how conclusions drawn)

  10. Some tradition of citation • Citation standard for machine-readable files created in 1979 • Citations available from data providers -- Census Bureau and ICPSR since late 1980s • Journals just beginning to cite data • Persistent identifiers: DOIs or handles

  11. Journal practices • Historically little effort to standardize or verify data references in publications • Growing movement to require data behind findings to be publically available • AER: Will publish only if “data used in the analysis are clearly and precisely documented and readily available for replication.”

  12. Influencing journals • Data-PASS campaign to influence journals sponsored by professional associations • Wrote to major professional associations demonstrating inconsistencies in citing data • Success with American Sociological Review, which changed submission criteria

  13. Linking data and publications • ICPSR has done this since the beginning in 1962 • Now a Bibliography of 60K citations to publications with two-way linking to data • Vendors like Thomson Reuters now interested in these linkages

  14. Summary -- Citation issues for social science • Versioning – Data can be dynamic • Unit/Granularity – What is optimal? • Importance of metadata – How to create durable link? • Replication –– Cite subsets and replication/workflow files containing scripts?

  15. Thank you… • Mary Vardiganvardigan@umich.edu

More Related