1 / 58

GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA

Data sharing & reuse. GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA. Lesson 8 Outline. Issues /obstacles related to reuse and sharing of data. Understand open access. Data citation. Luis Prado from The Noun Project. Benefits of sharing data.

Audrey
Download Presentation

GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data sharing & reuse GEO 802, Data Information Literacy Winter 2019 – Lecture 8 Gary Seitz, MA

  2. Lesson 8 Outline • Issues/obstacles related to reuse and sharing of data Understand open access Data citation Luis Prado from The Noun Project Benefits of sharing data

  3. Learner objectives • Explain the benefits of sharing data • Understand open access/open science/open data • Understand the need for data attribution and citation

  4. Data sharing What it is “… the practice of making data used for scholarly research available to others.” [Wikipedia] • Who’s involved • the data sharer • the data repository • the secondary data user • support staff! “… the

  5. Data sharing: top-down mandates Read andcompare http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm

  6. Drivers for data sharing • National research policies • Research Councils UK “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” • Economic & Social Research Council “ … publicly funded research data … valuable, long-term resources that, where practical, must be made available for secondary, scientific research.” • Medical Research Council “… publicly-funded research data … should be openly available to the maximum extent possible.” • Wellcome Trust “… aim(s) to ensure that the data generated by the research we support is managed and shared in a way that maximises the benefit to the public.”

  7. Value of Data Sharing: Discuss in groups of 3 To the Public? To the researcher sponsor? To the scientific community? To the scientist?

  8. Value of Data Sharing: To the Public A better informed public yields better decision making with regard to: • Environmental and economic planning • Federal, state, and local policies • social choices such as use of tax money and education options • personal lifestyle and health such as nutrition and recreation

  9. Value of data sharing: to researcher sponsor • Organizations that sponsor research must maximize the value of research money • Data sharing enhances the value of research investments by enabling: • verification of performance metrics and outcomes • new research and increased return on investment • advancement of the science • reduced data duplication expenditures

  10. Value of data sharing: to scientific community Access to related research enables community members to: • build upon the work of others and further, rather than repeat, the science • perform meta analyses that cannot be performed with individual datasets or laboratories • share resources and perspectives so that comprehension is expanded and enhanced

  11. Value of data sharing: to scientific community Access to related research enables community members to (cont’d): • increase transparency, reproducibility and comparability of results • expand methodology assessment, recommendations and improvement • educate new researchers as to the most current and significant findings

  12. Value of data sharing: to the scientist Scientists that share data gain the benefit of: • research sponsor recognition as an authoritative source and wise investment • improved data quality due to expanded use, field checks, and feedback • greater opportunity for data exchange • improved connections to scientific network, peers, and potential collaborators

  13. Value of data sharing

  14. http://www.scitechnol.com/2327-4581/2327-4581-1-e101.pdf

  15. Barriers to data sharing • “Scientists would rather share their toothbrush than their data!” [Carole Goble, Keynote address, EGEE (Enabling Grids for EscienceE) ‘06 Conference.] • Barriers to sharing can relate to … • the Researcher - intellectual property issues • the Institution - unrealised commercial value • the Subject - confidentiality

  16. Concerns about data sharing metadata metadata metadata metadata

  17. Concerns about data sharing

  18. Exercise: Reasons not to share data

  19. Exercise: Replies or Arguments

  20. Exercise: Reasons not to share data

  21. Exercise: Replies or Arguments

  22. Exercise: Reasons not to share data

  23. Exercise: Replies or Arguments

  24. Overviewofcurrentdatasharingpractice Take a lookattheseinfographicsfromWileytitledResearch Data Sharing Insights [PDF, 2.08MB], andGlobal Data Sharing Trends. Theyprovide a succinctoverviewofcurrentdatasharingpracticeandperceptions. Consider: Why do youthinktherearedifferencesbetweendisciplinesandcountries - whatchangestothesestatisticswouldyouexpectbetween2014/2016 andnow? Discussthis in yourgroup.

  25. Making data sharable Step One: Create robust metadata that is discoverable • specify geography and time periods • use discipline specific theme, place and temporal keywords, thesauri, and ontologies • describe attributes • include links to associated data catalogues, data downloads, project websites, etc.

  26. Making data shareable Step Two: Include archival and reference information • properly formatted data citations for the data and all sources • Universally Unique Identifiers (UUID) that uniquely identify your data and help to link the data with the metadataSee the DataONE unique identifier guidance at:https://releases.dataone.org/online/api-documentation-v2.0/design/PIDs.html Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20

  27. Making data sharable Step Three: Have data contributors review your metadata to ensure validity and organizational ‘correctness’? • are the processes described accurately? • are all contributions adequately identified? • has management reviewed the product and documentation? • is the funding organization properly recognized?

  28. Making data sharable Step Four: Publish your metadata and/or data via: (cont’d) Other Online Resources: • Project and/or Program websites • Links within online lessons and outreach products • Web-accessible folders (WAF) • Community or Public Cloud

  29. Disciplinary, discipline agnostic, and local Data sharing platforms

  30. Preservation & sharing platforms ScholarsArchive@OSU

  31. Reciprocal linking

  32. Reciprocal linking

  33. DOIs, data citation, and researcher IDs being discoverable & obtaining credit

  34. Lesson topics CC image by cybrarian77 on Flickr, • Data Citation in the Data Life Cycle • Definitions: What is Data Citation? • Benefits of Data Citation • Collaborating to Support Data Citation • How to Cite Data • How to Obtain a Persistent Identifier for a Data Set • Best Practices to Support Data Citation

  35. Learning Objectives • After completing this lesson, the participant will be able to: • Define data citation • Describe benefits of data citation • Identify roles of data authors/managers, data publishers, and journal publishers in supporting data citation • Recognize metadata elements useful for data citation • Recognize common persistent data locators and describe the process for obtaining one • Summarize best practices for supporting data citation

  36. Definitions • Data citation • The practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources • A key practice underpinning the recognition of data as a primary research output rather than as a by-product of research • Data author • Individual involved in research, education, or other activities that generate digital data that are subsequently deposited in a data collection • Persistent identifier • A unique web-compatible alphanumeric code that points to a resource (e.g., data set) that will be preserved for the long term (i.e., over several hardware and software generations) • Should direct to latest available version of resource or to metadata which enables acquisition of desired version or format

  37. Benefits of data citation CC image by futureatlas.com on Flickr,as “Citation Needed” • Short term • Facilitates discovery of relationships between data and publications, making it easier to validate and build upon previous work • Ensures that proper credit can be given when others use your work • Facilitates impact assessments of datasets based on number of publications that cite them • Helps researchers re-using data to find other ways the data has been used.

  38. Benefits of data citation CC image by gruntzooki on Flickr • Long term • Promotes the availability of data into the future • Facilitates discovery of existing data relevant to a particular question • Enables recognition of scholarly effort in within disciplines and organizations • Increases transparency of scientific research

  39. Principles of Data Citation https://www.force11.org/datacitation • Importance - Data should be considered legitimate, citable products of research • Credit and Attribution - Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data • Evidence - In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. • Unique Identification - A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

  40. Principles of Data Citation https://www.force11.org/datacitation Access - Data citations should facilitate access to the data, metadata, code, and other materials, as necessary for both humans and machines. Persistence - Unique identifiers, data, and metadata should persist beyond the lifespan of the data they describe. Specificity and Verifiability - Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Interoperability and Flexibility - Data citation methods should be flexible, but enable interoperability across communities.

  41. How to Cite Data CC image by walknboston on Flickr • Similar to citing a published article or book • Provide information necessary to identify and locate the work cited • Broadly-applicable data citation standards have not yet been established; use standards adopted by relevant academic journal, data repository, or professional organization

  42. Examples of information needed in a citation • Author/Principal Investigator/Data Creator • Release Date/Year of Publication – year of release, for a completed dataset • Title of Data Source – formal title of the dataset • Version/Edition Number – the version of the dataset used in the study • Format of the Data – physical format of the data • 3rd Party Data Producer – refers to data accessed from a 3rd party repository • Archive and/or Distributor – the location that holds the dataset

  43. Examples of information needed in a citation, con’t • Locator or Identifier – includes Digital Object Identifiers (DOI), Handles, Archival Resource Key (ARK), etc. • Access Date and Time – when data is accessed online • Subset of Data Used – description based on organization of the larger dataset • Editor or Contributor – reference to a person who compiled data, or performed value-added functions • Publication Place – city and state and country of the distributor of the data • Data within a Larger Work – refers to the use of data in a compilation or a data supplement (such as published in a peer-reviewed paper)

  44. Examples of data citation formats • DataCite: Creator (Publication Year): Title. Publisher. Identifier • Dryad: Author (Date of Article Publication) Data from: Article name. Dryad Digital Repository. doi: DOI number CC image by Paxsimius on Flickr

  45. Examples of data citation formats, con’t • Earth Science Information Partners (ESIP): • Required citation elements: Author. Release date. Title. Version. Archive/Distributor. Locator/Identifier. Access date and time. • Optional citation elements: Subset Used; Editor, Compiler, or other important role; Distributor, Associate Archive, or other Institutional Role; Data Within a Larger Work • Example citation: • Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J. Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry Data V018, 15 October to 18 November 2003. National Snow and Ice Data Center. Data set accessed 2011-07-21 at doi:10.3334/NSIDC/gla01.

  46. Examples of data citation formats, con’t

  47. Data citation Include, AT LEAST: Creator (PublicationYear): Title. Publisher. Identifier Better: Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier. AccessDate “Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M (2011) Data from: Data sharing by scientists: practices and perceptions. Dryad Digital Repository. doi:10.5061/dryad.6t94p. Accessed 18 April 2013.” from the article: “Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M (2011) Data sharing by scientists: practices and perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101”

  48. What are examples of Persistent Identifiers? A persistent identifier should be included in the citation: • DOI (Digital Object Identifier) • Globally unique, alphanumeric string assigned by a registration agency to identify content and provide a persistent link to its location. • May be assigned to any item of intellectual property that is defined by structured metadata • Examples:10.1234/NP5678, 10.5678/ISBN-0-7645-4889-4; 10.2224/2004-10-ISO-DOI • ARK (Archival Resource Key) • URL designed to support long-term access to information objects • Can refer to digital, physical, or intangible objects or living beings and groups • Example: http://ark.cdlib.org/ark:/13030/tf5p30086k

  49. What are examples of Persistent Identifiers? More persistent identifiers: • UUID (Universally Unique Identifier) • ‘practically unique’ identifiers that can be generated by distributed systems but later combined into a single database without needing to resolve identifier (ID) conflicts • 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters • Example: 550e8400-e29b-41d4-a716-446655440000 • Researcher identifier: ORCID (Open Researcher & Contributor ID) • Central registry of unique identifiers for individual researchers to address author name ambiguity • Transparent linking mechanism between ORCID and other author ID schemes

  50. Unique identifiers • DOI: Digital object identifier • Digital Identifier of an Object (not "Identifier of a Digital Object") • Object = any entity (thing: physical, digital, or abstract) • Resources, parties, licenses, etc. • Digital Identifier = network actionable identifier ("click on it and do something”) http://www.dcc.ac.uk/resources/how-guides/cite-datasets#sthash.assrcuj3.dpuf

More Related