1 / 16

IETF BOF Data Set Identifier Interoperability

IETF BOF Data Set Identifier Interoperability. Beth Plale Director , Data To Insight Center Indiana University. The DSII BOF.

morey
Download Presentation

IETF BOF Data Set Identifier Interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IETF BOFData Set Identifier Interoperability Beth Plale Director, Data To Insight CenterIndiana University

  2. The DSII BOF Discussion of persistent identifier solutions (part I) and steps to achieving interoperability among persistent identifiers (part II) for data sets made available on the Internet. The initial use case: scientific data sets produced by different research teams; Other use cases: media developed by different sources and combined into a common collection. This BoF is not intended to form a working group at this session.

  3. Science Data Deluge • A lot of data being generated is in sciences – through ocean instruments, air quality sensors, through gene sequencing machines, through climate models … • Research funding agencies want to see research data from funded efforts be available for reuse: today, and decades into future: • “The National Science Foundation is committed to the principle that the various forms of data collected with public funds belong in the public domain.” Data Archiving Policy, NSF Social Behavioral and Economic Sciences

  4. Problem acute in Long Tail Power law graph showing popularity ranking. To right (in yellow) is long tail; To left are few that dominate. Note that areas of both regions are equal.

  5. Long tail and on-line business • Chris Anderson (Wired 2004) popularized term “long tail”. Has two complementary ideas: • First that merchandise assortments can grow because goods are not limited by shelf space, and • Second, that online venues change the demand curve because consumers value niche products. • These complementary forces result in tail that steadily grows both longer as more obscure products are made available, but also fatter as consumers discover products better suited to their tastes.

  6. Long tail and data • Emerging trend in science of inexpensive instrument producing huge volumes of data. • E.g., Genetic sequencing machine, inexpensive enough for purchase by a research lab, yet produces Terabytes of data with every run. • Long tail of science and scholarly activity goes beyond simply project size toencompass set of sub-disciplines who carry out “small or localized science” • These are researchers whose collective numbers actually account for an enormous amount of data-driven science.

  7. Key role of Metadata in Science Data • Metadata must be preserved when scientific data is generated because metadata is ephemeral – Jim Gray • “The management, organization, access, and preservation of digital data is arguably a ‘grand challenge’ of the information age” - Fran Berman (2008) • If annotation is left to the scientist, it is not done (U.K. e-Science Core) • The further the distance between data producer and re-use, the more detailed the metadata that’s required.

  8. Generalizing to Needs for Tracking “the Object” • Defn “Objects”: an information resource that could be • Data set • Digital documents • Software • Websites • Physical objects: books, bones, statues, etc. • Intangible objects: chemicals, diseases, vocabulary terms, performances Area of largest concern

  9. Metadata Associated with Identifier Includes: Checksums, pointer to metadata, rights information, also: C: [opens session] C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1 C: S: HTTP/1.1 200 OK S: <snip> S: erc: S: who: Lederberg, Joshua S: what: Studies of Human Families for Genetic Linkage S: when: 1974 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf S: [closes session]

  10. Operations performed upon identifiers • discovery, • data access, • access control, and • logical arrangement.   We find cases for all of these operations, implying the need for multiple identifiers 

  11. Governance and Cost • Where are resolvers/assigners run? • Is distribution model for resolvers scalable to the levels needed by data object discovery and use? • What organization(s) have long term oversight over continued existence of resolving/assigning/interoperability services?

  12. Part II: Data set identifier Interoperability • Metadata interoperability • Relationship interoperability • Service interoperability

  13. Metadata Interoperability • One solution: universal implementation of common metadata scheme for all identifier schemes • Otherwise: mechanisms through which possible to • Use descriptive metadata associated with one identifier in context of another identifier; • Aggregate descriptive metadata associated with several different identifiers in single context. • And do so without loss of semantic value (meaning).

  14. Relationship Interoperability • Standard mechanisms for expressing relationships between the objects identified under different identifiers schemes • "The publisher identified with this [standard party identifier] is the publisher of this journal identified with this ISSN." • This implies development of standard set of typed relationships between identifiers with well-defined semantics.

  15. Service Interoperability • The creation of common services: • "...the use of shared syntax or physical interface for request/response for provision of services and/or data.” • Types of services might include: • Metadata look up services:user resolves identifier to set of metadata about object • Identifier discovery services: user with limited set of metadata can discover identifier or identifiers for that object.

  16. References • EPIC: European based. Works with Handle System, http://www.pidconsortium.eu • EZID: long term identifiers made easy, works with both DataCite DOI and ARKhttp://n2t.net/ezid • The ARK Identifier Scheme, Internet-Draft, 2012-04 http://www.ietf.org/internet-drafts/draft-kunze-ark-16.txt • The Handle System, http://www.handle.net/ • Handle System Overview, Nov03 RFC 3650 • Handle System Namespace and Service Definition, Nov03 RFC 3651 • Handle System Protocol (v2.1) Nov 03 RFC 3652   • Terminology and Use Cases for Interoperability of Identifier Resolution Systems, Internet Draft, 2012-07https://datatracker.ietf.org/doc/draft-kahn-dsii-id-res-sys/ • On the utility of identification schemes for digital earth science data: an assessment and recommendationshttp://rd.springer.com/article/10.1007/s12145-011-0083-6/fulltext.html • Identifier Interoperability: A Report on Two Recent ISO Activities, http://www.dlib.org/dlib/april06/paskin/04paskin.html

More Related