270 likes | 395 Views
Tracking the impact of data – how? Sarah Callaghan sarah.callaghan@stfc.ac.uk @sorcha_ni 1 st Altmetrics conference, London, 25-26 September 2014. Who are we and why do we care about data?.
E N D
Tracking the impact of data – how?Sarah Callaghansarah.callaghan@stfc.ac.uk@sorcha_ni1st Altmetrics conference, London, 25-26 September 2014
Who are we and why do we care about data? • The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings. • We deal with a variety of environmental measurements, along with the results of model simulations in: • Atmospheric science • Earth sciences • Earth observation • Marine Science • Polar Science • Terrestrial & freshwater science, Hydrology and Bioinformatics • Space Weather
OpenAIRE Portal Develop an Open Access, participatory infrastructure for scientific information that includes: Publications Datasets Projects Interlinking www.openaire.eu
Data, Reproducibility and Science Science should be reproducible – other people doing the same experiments in the same way should get the same results. Observational data is not reproducible (unless you have a time machine!) Therefore we need to have access to the data to confirm the science is valid! http://www.flickr.com/photos/31333486@N00/1893012324/sizes/o/in/photostream/
It used to be “easy”… The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867 Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665 …but datasets have gotten so big, it’s not useful to publish them in hard copy anymore
Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too. We want to reward researchers for putting that effort in!
Some examples of data (just from the Earth Sciences) • Time series, some still being updated e.g. meteorological measurements • Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer • 2D scans e.g. satellite data, weather radar data • 2D snapshots, e.g. cloud camera • Traces through a changing medium, e.g. radiosonde launches, aircraft flights, ocean salinity and temperature • Datasets consisting of data from multiple instruments as part of the same measurement campaign • Physical samples, e.g. fossils
What is a Dataset? • DataCite’s definition (http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf): • Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." • (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). • In my opinion a dataset is something that is: • The result of a defined process • Scientifically meaningful • Well-defined (i.e. clear definition of what is in the dataset and what isn’t)
After the data is downloaded, what happens then? • Short answer: • We don’t know!! • Unless the data user comes back to us to tell us. • Or we stumble across a paper which • Cites us • Or mentions us in a way that we can find • And tells us what the dataset the authors used was. • This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation.
The Noble Eight-Fold Path to Citing Data • Importance • Credit and attribution • Evidence • Unique Identification • Access • Persistence • Specificity and verifiability • Interoperability and flexibility Principles are supplemented with a glossary, references and examples http://force11.org/datacitation
How we (NERC) cite data • We using digital object identifiers (DOIs) as part of our dataset citation because: • They are actionable, interoperable, persistent links for (digital) objects • Scientists are already used to citing papers using DOIs (and they trust them) • Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs. • We have a good working relationship with the British Library and DataCite NERC’s guidance on citing data and assigning DOIs can be found at:http://www.nerc.ac.uk/research/sites/data/doi.asp
Dataset catalogue page (and DOI landing page) Dataset citation Clickable link to Dataset in the archive
Data metrics – the state of the art! • Data citation isn’t common practice (unfortunately) • Data citation counts don’t exist yet • To count how often BADC data is used we have to: • Search Google Scholar for “BADC”, “British Atmospheric Data Centre” • Scan the results and weed out false positives • Read the papers to figure out what datasets the authors are talking about (if we can) • Count the mentions and citations (if any) We’re working with DataCite and Thompson Reuters to get data citation counts. http://www.lol-cat.org/little-lovely-lolcat-and-big-work/
Altmetrics and social media for data? Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers. We have a social media presence @CEDAnews - Mainly used for announcements about service availability We definitely want ways of showing our funders that we provide a good service to our users and the research community. And we want to be able to tell our depositors what impact their data has had!
RDA Bibliometrics for Data WG – preliminary survey results • Launched 3rd September • As of 17th September – 63 responses • 100% completion • Survey link still live https://www.surveymonkey.com/s/RDA_bibliometrics_data • Science 3 • Earth sciences 16 • Physics 4 • Scientometrics and bibliometrics 4 • Engineering 2 • Chemistry 1 • Biology (inc. zoology) 2 • STEM 1 • Medicine & biomedical research 8 • Energy 1 • Admin for research 2 • Computer science 4 • Social science, policy and economics 4 • Librarian and digital curation 11
Future and missing • In the future, what would you like to use to evaluate the impact of data? • Most popular suggestions: • Data citations • Actual use in professional practice • Download statistics • Mentions in social media • DOIs/PIDs • Altmetrics • Well regarded indicators • Also pleas for: • Easy to use and set up • Radically different tools • Whatever tool can provide reliable information • Best estimate of societal benefit in $$ terms • What is currently missing and/or needs to be created for bibliometrics for data to become widely used? • Most popular suggestions: • Culture change! • Principles and standards for consistent practice (and enforcement of these) • Use of PIDs • Mature tools for data citation, publishing, discovery and impact analysis • Openness in papers and patents • Also: • Research on what current metrics actually measure • Infrastructure • Free apps
Please help! • Survey link still live! • https://www.surveymonkey.com/s/RDA_bibliometrics_data • Please pass on the link to anyone who might be interested and encourage others to fill in the survey! • Share your experience with altmetrics – join the RDA WG on Publishing Data Bibliometrics • https://rd-alliance.org/group/rdawds-publishing-data-bibliometrics-wg.html • Thank you! • Sarah Callaghansarah.callaghan@stfc.ac.uk@sorcha_ni http://weknowmemes.com/generator/meme/379914/ Work funded by the European Commission as part of the project OpenAIREplus (FP7-INFRA-2011-2, Grant Agreement no. 283595)