580 likes | 731 Views
Data sharing & reuse. GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA. Lesson 8 Outline. Issues /obstacles related to reuse and sharing of data. Understand open access. Data citation. Luis Prado from The Noun Project. Benefits of sharing data.
E N D
Data sharing & reuse GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA
Lesson 8 Outline • Issues/obstacles related to reuse and sharing of data Understand open access Data citation Luis Prado from The Noun Project Benefits of sharing data
Learner objectives • Explain the benefits of sharing data • Understand open access/open science/open data • Understand the need for data attribution and citation
Data sharing What it is “… the practice of making data used for scholarly research available to others.” [Wikipedia] • Who’s involved • the data sharer • the data repository • the secondary data user • support staff! “… the
Data sharing: top-down mandates Read andcompare http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm
Drivers for data sharing • National research policies • Research Councils UK “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” • Economic & Social Research Council “ … publicly funded research data … valuable, long-term resources that, where practical, must be made available for secondary, scientific research.” • Medical Research Council “… publicly-funded research data … should be openly available to the maximum extent possible.” • Wellcome Trust “… aim(s) to ensure that the data generated by the research we support is managed and shared in a way that maximises the benefit to the public.”
Value of Data Sharing: Discuss in groups of 3 To the Public? To the researcher sponsor? To the scientific community? To the scientist?
Value of Data Sharing: To the Public A better informed public yields better decision making with regard to: • Environmental and economic planning • Federal, state, and local policies • social choices such as use of tax money and education options • personal lifestyle and health such as nutrition and recreation
Value of data sharing: to researcher sponsor • Organizations that sponsor research must maximize the value of research money • Data sharing enhances the value of research investments by enabling: • verification of performance metrics and outcomes • new research and increased return on investment • advancement of the science • reduced data duplication expenditures
Value of data sharing: to scientific community Access to related research enables community members to: • build upon the work of others and further, rather than repeat, the science • perform meta analyses that cannot be performed with individual datasets or laboratories • share resources and perspectives so that comprehension is expanded and enhanced
Value of data sharing: to scientific community Access to related research enables community members to (cont’d): • increase transparency, reproducibility and comparability of results • expand methodology assessment, recommendations and improvement • educate new researchers as to the most current and significant findings
Value of data sharing: to the scientist Scientists that share data gain the benefit of: • research sponsor recognition as an authoritative source and wise investment • improved data quality due to expanded use, field checks, and feedback • greater opportunity for data exchange • improved connections to scientific network, peers, and potential collaborators
Barriers to data sharing • “Scientists would rather share their toothbrush than their data!” [Carole Goble, Keynote address, EGEE (Enabling Grids for EscienceE) ‘06 Conference.] • Barriers to sharing can relate to … • the Researcher - intellectual property issues • the Institution - unrealised commercial value • the Subject - confidentiality
Concerns about data sharing metadata metadata metadata metadata
Overviewofcurrentdatasharingpractice Take a lookattheseinfographicsfromWileytitledResearch Data Sharing Insights [PDF, 2.08MB], andGlobal Data Sharing Trends. Theyprovide a succinctoverviewofcurrentdatasharingpracticeandperceptions. Consider: Why do youthinktherearedifferencesbetweendisciplinesandcountries - whatchangestothesestatisticswouldyouexpectbetween2014/2016 andnow? Discussthis in yourgroup.
Making data sharable Step One: Create robust metadata that is discoverable • specify geography and time periods • use discipline specific theme, place and temporal keywords, thesauri, and ontologies • describe attributes • include links to associated data catalogues, data downloads, project websites, etc.
Making data shareable Step Two: Include archival and reference information • properly formatted data citations for the data and all sources • Universally Unique Identifiers (UUID) that uniquely identify your data and help to link the data with the metadataSee the DataONE unique identifier guidance at:https://releases.dataone.org/online/api-documentation-v2.0/design/PIDs.html Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
Making data sharable Step Three: Have data contributors review your metadata to ensure validity and organizational ‘correctness’? • are the processes described accurately? • are all contributions adequately identified? • has management reviewed the product and documentation? • is the funding organization properly recognized?
Making data sharable Step Four: Publish your metadata and/or data via: (cont’d) Other Online Resources: • Project and/or Program websites • Links within online lessons and outreach products • Web-accessible folders (WAF) • Community or Public Cloud
Disciplinary, discipline agnostic, and local Data sharing platforms
Preservation & sharing platforms ScholarsArchive@OSU
DOIs, data citation, and researcher IDs being discoverable & obtaining credit
Lesson topics CC image by cybrarian77 on Flickr, • Data Citation in the Data Life Cycle • Definitions: What is Data Citation? • Benefits of Data Citation • Collaborating to Support Data Citation • How to Cite Data • How to Obtain a Persistent Identifier for a Data Set • Best Practices to Support Data Citation
Learning Objectives • After completing this lesson, the participant will be able to: • Define data citation • Describe benefits of data citation • Identify roles of data authors/managers, data publishers, and journal publishers in supporting data citation • Recognize metadata elements useful for data citation • Recognize common persistent data locators and describe the process for obtaining one • Summarize best practices for supporting data citation
Definitions • Data citation • The practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources • A key practice underpinning the recognition of data as a primary research output rather than as a by-product of research • Data author • Individual involved in research, education, or other activities that generate digital data that are subsequently deposited in a data collection • Persistent identifier • A unique web-compatible alphanumeric code that points to a resource (e.g., data set) that will be preserved for the long term (i.e., over several hardware and software generations) • Should direct to latest available version of resource or to metadata which enables acquisition of desired version or format
Benefits of data citation CC image by futureatlas.com on Flickr,as “Citation Needed” • Short term • Facilitates discovery of relationships between data and publications, making it easier to validate and build upon previous work • Ensures that proper credit can be given when others use your work • Facilitates impact assessments of datasets based on number of publications that cite them • Helps researchers re-using data to find other ways the data has been used.
Benefits of data citation CC image by gruntzooki on Flickr • Long term • Promotes the availability of data into the future • Facilitates discovery of existing data relevant to a particular question • Enables recognition of scholarly effort in within disciplines and organizations • Increases transparency of scientific research
Principles of Data Citation https://www.force11.org/datacitation • Importance - Data should be considered legitimate, citable products of research • Credit and Attribution - Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data • Evidence - In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. • Unique Identification - A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.
Principles of Data Citation https://www.force11.org/datacitation Access - Data citations should facilitate access to the data, metadata, code, and other materials, as necessary for both humans and machines. Persistence - Unique identifiers, data, and metadata should persist beyond the lifespan of the data they describe. Specificity and Verifiability - Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Interoperability and Flexibility - Data citation methods should be flexible, but enable interoperability across communities.
How to Cite Data CC image by walknboston on Flickr • Similar to citing a published article or book • Provide information necessary to identify and locate the work cited • Broadly-applicable data citation standards have not yet been established; use standards adopted by relevant academic journal, data repository, or professional organization
Examples of information needed in a citation • Author/Principal Investigator/Data Creator • Release Date/Year of Publication – year of release, for a completed dataset • Title of Data Source – formal title of the dataset • Version/Edition Number – the version of the dataset used in the study • Format of the Data – physical format of the data • 3rd Party Data Producer – refers to data accessed from a 3rd party repository • Archive and/or Distributor – the location that holds the dataset
Examples of information needed in a citation, con’t • Locator or Identifier – includes Digital Object Identifiers (DOI), Handles, Archival Resource Key (ARK), etc. • Access Date and Time – when data is accessed online • Subset of Data Used – description based on organization of the larger dataset • Editor or Contributor – reference to a person who compiled data, or performed value-added functions • Publication Place – city and state and country of the distributor of the data • Data within a Larger Work – refers to the use of data in a compilation or a data supplement (such as published in a peer-reviewed paper)
Examples of data citation formats • DataCite: Creator (Publication Year): Title. Publisher. Identifier • Dryad: Author (Date of Article Publication) Data from: Article name. Dryad Digital Repository. doi: DOI number CC image by Paxsimius on Flickr
Examples of data citation formats, con’t • Earth Science Information Partners (ESIP): • Required citation elements: Author. Release date. Title. Version. Archive/Distributor. Locator/Identifier. Access date and time. • Optional citation elements: Subset Used; Editor, Compiler, or other important role; Distributor, Associate Archive, or other Institutional Role; Data Within a Larger Work • Example citation: • Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J. Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry Data V018, 15 October to 18 November 2003. National Snow and Ice Data Center. Data set accessed 2011-07-21 at doi:10.3334/NSIDC/gla01.
Data citation Include, AT LEAST: Creator (PublicationYear): Title. Publisher. Identifier Better: Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier. AccessDate “Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M (2011) Data from: Data sharing by scientists: practices and perceptions. Dryad Digital Repository. doi:10.5061/dryad.6t94p. Accessed 18 April 2013.” from the article: “Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M (2011) Data sharing by scientists: practices and perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101”
What are examples of Persistent Identifiers? A persistent identifier should be included in the citation: • DOI (Digital Object Identifier) • Globally unique, alphanumeric string assigned by a registration agency to identify content and provide a persistent link to its location. • May be assigned to any item of intellectual property that is defined by structured metadata • Examples:10.1234/NP5678, 10.5678/ISBN-0-7645-4889-4; 10.2224/2004-10-ISO-DOI • ARK (Archival Resource Key) • URL designed to support long-term access to information objects • Can refer to digital, physical, or intangible objects or living beings and groups • Example: http://ark.cdlib.org/ark:/13030/tf5p30086k
What are examples of Persistent Identifiers? More persistent identifiers: • UUID (Universally Unique Identifier) • ‘practically unique’ identifiers that can be generated by distributed systems but later combined into a single database without needing to resolve identifier (ID) conflicts • 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters • Example: 550e8400-e29b-41d4-a716-446655440000 • Researcher identifier: ORCID (Open Researcher & Contributor ID) • Central registry of unique identifiers for individual researchers to address author name ambiguity • Transparent linking mechanism between ORCID and other author ID schemes
Unique identifiers • DOI: Digital object identifier • Digital Identifier of an Object (not "Identifier of a Digital Object") • Object = any entity (thing: physical, digital, or abstract) • Resources, parties, licenses, etc. • Digital Identifier = network actionable identifier ("click on it and do something”) http://www.dcc.ac.uk/resources/how-guides/cite-datasets#sthash.assrcuj3.dpuf