1 / 14

Use Cases for Identifiers Beyond Data Citation* …

Explore the importance of data citation and identification in scholarship, addressing concerns such as attribution, ownership, and provenance, while enabling traceability and reproducibility.

lcurrie
Download Presentation

Use Cases for Identifiers Beyond Data Citation* …

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use Cases for Identifiers Beyond Data Citation* … IN041D-08 December 15, 2016, AGU Peter Fox (RPI) pfox@cs.rpi.edu, @taswegian, #twcrpi Tetherless World Constellation, http://tw.rpi.edu Earth and Environmental Science, Computer Science, Cognitive Science, IT and Web Science and Mark Parsons (RPI and RDA) parsom3@rpi.edu, Institute for Data Exploration and Applications http://idea.rpi.edu @chutneyboy * originating from 2014 ESIP Winter dinner conversation

  2. Motivation • It started with a tweet in 2010 from Cape Town at the CODATA meeting and continued over many cappuccinos in Melbourne at IUGG 2011 • Led to Parsons & Fox, 2012, Is Data Publication the Right Metaphor? Data Science Journal, … aka orcid.org/0000-0002-7723-0950 & orcid.org/0000-0002-1009-7163, xsd:2012, http://dx.rpi.edu/10833/4199-5811-3221-0002-CC/?dc:title, jns:10.2481, doi:10.2481/dsj.WDS-042 • We were exploring metaphors and concluded that the community at-large might be expecting too much from the publishing metaphor • We had “concerns” • Then data citation got people’s attention, cf. literature

  3. Metaphors • Lakoff and Johnson(1980) Metaphor is for most people a device of the poetic imagination and the rhetorical flourish— a matter of extraordinary rather than ordinary language. Moreover, metaphor is typically viewed as characteristic of language alone, a matter of words rather than thought or action. For this reason, most people think they can get along perfectly well without metaphor. We have found, on the contrary, that metaphor is pervasive in everyday life, not just in language but in thought and action. Our ordinary conceptual system, in terms of which we both think and act, is (p. 3, our emphasis) fundamentally metaphorical in nature

  4. Data Citation • “Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.” (http://www.force11.org/datacitation)

  5. Concerns • Identification v. Location – URI v. URL = what is it versus where is it (now)? • Attribution and credit (discredit) • Ownership and governance • Provenance and traceability • Impact and return on investment = fame and fortune ;-)

  6. Complete Traceability for National Climate Assessment Easier . . . . . . . . . . . . . . . . . . . ... . . Harder Transparency ------------------------------------------------------------------------ Reproducibility Traceable Sources Traceable Data Traceable Processes Traceable Tools • References • Image sources • Data sources • Link to datasets • Complete metadata • Description of methods • Access to process info & review • Access to computer code • Description of systems and platforms

  7. Dataset metadata from a image in a figure

  8. Identifier Resolution doi:10.5067/MEASURES/GSSTF/DATA308 A common, persistent, citable reference to that dataset. We build GCIS specific identifiers from those: http://data.globalchange.gov/doi/10.5067/MEASURES/GSSTF/DATA308 Then we can resolve it (with content negotiation) on our site, and link it with identifiers for our other resources, including asserting equivalence and linking with the data center responsible for stewardship and distribution of the actual data. We can also refer and link to other repositories of information about those resources.

  9. Citation as-a group of use cases • Back to the granularity choice… • Citation • Micro-citation • Nano-citation • I mean, really??? Yotta-citation? • Software citation use cases: https://github.com/researchsoftwareinstitute/software-data-citation-ws/issues/ • All about identifiers…

  10. Roles: CRediT - http://docs.casrai.org/CRediT • Facilitate authorship/contributor-ship disclosure processes and policies • Identify good practices for tracking contributions to the components of scholarly published output • Minimize authorship disputes • Enable appropriate recognition for the different contributions in multi-authored works – across all aspects of the research being reported (including data curation, statistical analysis, etc.) • Support identification of peer reviewers and experts • Support grant making by enabling funders to more easily identify those responsible for specific research products, developments or breakthroughs

  11. CRediT • Improve automated tracking of funding outcomes and impact • Support new forms of social and research networking • Further developments in data management and nano-publication • Inform the “science of science”, e.g. studies of productivity over a career trajectory • Enable new metrics of credit and attribution

  12. About Types (~ Roles, ~ Artifacts, ~ Activities) • Types of identifiers – PIT from the Research Data Alliance - https://www.rd-alliance.org/group/pid-information-types-wg/outcomes/pid-information-types • Permanency -> PIDs = get your tattoo • Roles seem to be important • Conjecture: it might be about reducing uncertainty in the what and the who -> entropy and mutual information

  13. Entropy and Rates • R=H(x)-H(x|y) (Shannon; 1948) • R=Rate of transmission - measures the average ambiguity of the received signal orcid.org/0000-0002-7723-0950 & orcid.org/0000-0002-1009-7163, xsd:2012, http://dx.rpi.edu/10833/4199-5811-3221-0002-CC/?dc:title, jns:10.2481, doi:10.2481/dsj.WDS-042 Mutual information – lowers entropy p(xi) is the probability mass function of outcome xi.

  14. Recap • Is anyone “crawling” our identifiers? • Who is scraping our landing pages? • Are we querying for additional content? • “It” continues over many adult beverages in <any place> • “We” want more metaphors from the community at-large • “We” still have “concerns” We = orcid.org/0000-0002-7723-0950 & orcid.org/0000-0002-1009-7163 aka @chutneyboy & @taswegian

More Related