560 likes | 700 Views
Data Citation: Key to Discovery, Reuse, and Tracking Impact. Curating and Managing Research Data for Reuse ICPSR Summer Program August 2, 2013 Elizabeth Moss, MSLIS eammoss@umich.edu. Today’s talk. A tour of the ICPSR Bibliography of Data-related Literature
E N D
Data Citation: Key to Discovery,Reuse, and Tracking Impact Curating and Managing Research Data for Reuse ICPSR Summer Program August 2, 2013 Elizabeth Moss, MSLIS eammoss@umich.edu
Today’s talk • A tour of the ICPSRBibliography of Data-related Literature • The challenges of tracking data reuse (you have to be able to discern data use before you can track data reuse) • Efforts to improve citing standards and practices, leading to sharing and impact
Who uses these shared data?How are they used?With what impact?
Link research data to scholarly literature about it • Increase likelihood of discovery and reuse • Aid students, instructors, researchers, and funders The ICPSRBibliography of Data-related Literature
It’s really a searchable database . . .. . . containing over 65,000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR . . . that resides in Oracle, with an internal UI for database management. . . that can generate study bibliographieslinking each study with the literature about it, and out to the full text
It’s useful to all stakeholders Instructors direct students to begin data-related research projects by reading some of the major works based on the data Advanced researchers also use it to conduct a focused literature review before deciding to use a dataset Reporters and policymakers looking for processed statistics look for reports explaining studies Principal investigators and funding agencies want to track how data are used after they are deposited
Provide PIs and data users with citations (since 1990) and DOIs (since 2008) for all study-level data
Explicit citation,in the references, with the DOI “The use of DOI names for the citing of data sets would make their provenance trackable and citable and therefore allow interoperability with existing reference services like Thomson Reuters “Web of Science . . .” From: http://www.codata.org/taskgroups/TGdatacitation/index.html doi:10.3886/ICPSR21240
Data “Sighting” (implicit) vs. Data Citing (explicit)
Typical “sightings” • Sample described, not named, no author information, no access information, only a publication cited • Data named in text, with some attribution, but no access information • Cited in reference section, but with no permanent, unique identifier, so difficult for indexing scripts to find to automate tracking
Challenges in database search infrastructure • Journal databases fielded for journal article discovery are not ideal for finding data “sightation” • No field searching on methods sections • Full-text search brings back too many bad hits • Limiting to abstract misses too many good hits
Challenges in tracking many studies • Tension between highly curating a manageable collection and minimally maintaining a broad collection • Too many publications for efficient collection by humans, so we must make it easy for scripts to do it reliably
Challenges of completeness • Data use that is too difficult/costly to find cannot be counted • A selective sample, difficult to draw accurate conclusions in broad analyses of reuse
Challenges in lack of data management planning • Publishing sequence prevents citation creation before publication • Potential for change by educating the PI/mentor; graduate directors; liaisonlibrarians • Consciousness raising starting to occur due to funders’ requirements
Poorly described and cited data + Excessive human search effort = Too costly, too questionable for confident measure of impact
Citing data with a DOI + Minimal human search effort = High hit accuracy for the cost, and better confidence of impact measures
Building a culture of viable data citation to improve measures of impact
From: CODATA Data Citation Standards and Practices Task Group. 2012. Task Group Data Citation and Attribution Bibliography http://www.codata.org/taskgroups/TGdatacitation/docs/CODATA_DDCTG_BestPracticesBib_FINAL_17June2012.pdf
The tool enables users to search the DataCite Metadata Store for their works, and subsequently to add (or claim) those research outputs – including datasets, software, and other types – to their ORCID profile. This should increase the visibility of these research outputs, and will make it easier to use these data citations in applications that connect to the ORCID Registry – ImpactStory is one of several services already doing this. http://odin-project.eu/2013/05/13/new-orcid-integrated-data-citation-tool/ http://odin-project.eu/
Finding data with simple search fields Integration with Web of Knowledge All Databases: Research data is equal to research literature
Articles linked to underlying data. Increased data discovery. Reward for data citation. Potential for automated tracking. What audience does this have? Anecdotally, no large group of adopters yet. Converting journal search infrastructure to meet the needs of data, but synching metadata still a work in progress.
“CODATA, the Committee on Data for Science and Technology, is an interdisciplinary Scientific Committee of the International Council for Science (ICSU), was established 40 years ago. CODATA works to improve the quality, reliability, management and accessibility of data of importance to all fields of science and technology.” From: http://www.codata.org/about/who.html http://www.codata.org/taskgroups/TGdatacitation/index.html
“The move to encourage wider access to the results of publicly-funded research will have limited impact without the associated tools, networks and standards that are needed for sharing and mining of data. The Research Data Alliance aims to provide them.” https://rd-alliance.org/
Altmetrics are an attempt to augment or replace the inadequate ways we now use to determine relevant and significant sources of knowledge: 1. peer review 2. citation counting 3. journal impact factors • In-text links, • blogs, • tweets, • bookmarks, • likes, • data downloads . . . Altmetrics.org/manifesto