210 likes | 233 Views
INFORMATION FACILITY. GLOBAL BIODIVERSITY. Outcomes of the GBIF LSID-GUID Task Group. Greg Riccardi Co-chair 9 November 2009. WWW.GBIF.ORG. Overview. Task Group Overview The Characteristics of Effective Identifiers Benefits and Opportunities Recommendations Discussion Session
E N D
INFORMATIONFACILITY GLOBALBIODIVERSITY Outcomes of the GBIF LSID-GUID Task Group Greg Riccardi Co-chair 9 November 2009 WWW.GBIF.ORG
Overview • Task Group Overview • The Characteristics of Effective Identifiers • Benefits and Opportunities • Recommendations • Discussion Session • Thursday, 12 Nov, 1400-1530
GUID Goals from GBIF Strategic Plans • The GBIF strategic plans document includes goals • To consolidate the underlying enabling infrastructure and standardisation for global connectivity of biodiversity data and information • To develop a system of globally unique identifiers and encourage their use throughout biodiversity informatics • To use TDWG standards to allow all data objects to be identified using standard actionable globally unique identifiers • To provision GBIF web services and user interfaces to allow users to locate and view any data object with a standard globally unique identifier.
Call to the Task Group • GBIF convened a task group, the “LSID GUID Task Group” (LGTG) • to explore the issues and offer recommendations on the way forward, with particular reference to the GBIF network, • that will enable GBIF to provide architecture leadership and best practices for implementation. • The principal objective of the group is • to provide recommendations and guidelines on deployment of identifiers on the GBIF network with particular reference to the potential role of GBIF as a stable, long term provider of identifier resolution services.
Members • Phil Cryer (Missouri Botanical Garden) • Roger Hyam (Natural History Museum and PESI) • Chuck Miller (Missouri Botanical Garden) • Nicola Nicolson (Royal Botanic Gardens, Kew) • Éamonn Ó Tuama (GBIF) • Rod Page (University of Glasgow) • Jonathan Rees (Science Commons) • Greg Riccardi (co-chair, Florida State University) • Kevin Richards (Landcare Research, New Zealand) • Richard White (co-chair, Cardiff University)
Results • Report document • Draft written at the August 2009 workshop at GBIF • Revised for distribution in October 2009 • Contents of report • Overview of definitions and technology • Recommendations for the GBIF secretariat and for the biodiversity community • Report delivered to GBIF Science Committee • Response of committee (at end of talk)
Overview • Task Group Overview • The Characteristics of Effective Identifiers • Benefits and Opportunities • Recommendations • Discussion Session • Thursday, 12 Nov, 1400-1530
Preliminary Definition • An identifier is a character string associated with an object. • Identifiers are used in informatics to refer to objects in data sets, documents and repositories. • Some identifiers are useful • Some are more useful
Characteristics of Effective Identifiers • Two use cases that make identifiers effective for users • Uniqueness of reference to a single object • An identifier can be used to aggregate information about the identified object • For example, information received from multiple sources associated with a single identifier is information about a single object. • Actions may be carried out using the identifier • An identifier can be used to find further information about the object, concept or data to which it refers. • This information might be interpreted directly or used to support services.
Problems with terminology • The task group struggled with terms • GUID is problematic • Used in IT to refer to the way that Microsoft uses 128 bit UUIDs • Used in biodiversity to refer to … • Persistent, actionable identifier • The Task Group recommendation for terminology • Two required characteristics: persistent and actionable
Persistent Identifier • Persistence: The property that an identifier always refers to a specific object. • All information associated with a persistent identifier is about the same object. • The properties of the object are subject to change, but once a persistent identifier is assigned to one object, it cannot be reused to refer to a different object. • Example • ITIS TSNs are integers that are persistent identifiers for taxa
Actionable Identifiers • An identifier is actionable if there is a service that, given the identifier, provides information about the object identified • E.g., a resolution service maps an identifier into a Web service that provides information about the identifier and its associated object • Example • An HTTP URI is actionable. • The HTTP system provides mechanisms for clients to access informationabouta data object from its associated identifier. • ITIS TSNsare actionable because ITIS supports services that provide information for TSNs.
Good Identifier Technologies • HTTP URI: A fundamental technology of WWW • Persistence assured using DNS • Actionable through HTTP protocol • LSID: Life Science Identifiers • Persistence assured by convention • Actionable according to the LSID services model • May be mapped into HTTP URI by resolution services • Recommendation: Both are important to biodiversity and should be supported by GBIF • UUID • Persistence assured by random assignment • Not independently actionable • Can be an effective part of HTTP URI and LSID technologies
Overview • Task Group Overview • The Characteristics of Effective Identifiers • Benefits and Opportunities • Recommendations • Discussion Session • Thursday, 12 Nov, 1400-1530
Example Benefits of IDs • Tracking citation and impact • The association among objects might be contained in a blog post: • Joe writes “I searched the GBIF repository for all frogs from Cuba. The collection of objects that I found useful are in the collection [ID1]. I plotted the locations of the records [ID2] and reported the results in my paper [ID3]. • Such an association provides feedback and is used by search engines in rankings and ratings • Management and disambiguation of taxon names • Disambiguation of taxon names requires services that support tests of difference as well as of equality. • Different identifiers do not necessarily refer to different objects. • Tests of inequality for objects must rely on evaluation of metadata or of the objects themselves.
Opportunity • Integrating identifiers with the Semantic Web and the Linked Data model • Linked Data (http://linkeddata.org) is a vision of a web of interconnected data, to be consumed by machines • HTTP URIs are used as identifiers, and the data is described using RDF • If we use HTTP URIs for identifiers, we will be part of Linked Data
Overview • Task Group Overview • The Characteristics of Effective Identifiers • Benefits and Opportunities • Recommendations • Discussion Session • Thursday, 12 Nov, 1400-1530
Recommendations: GBIF Should • Take the leadership role in driving the application and use of identifiers in biodiversity informatics, • Provide materials such as an executive summary targeted to administrative leadership explaining the costs and benefits of implementing persistent identifiers, • Educate the community in general persistent identifier principles and practices, • Encourage, support and advise on the use of appropriate identifier technologies, in particular lsids and HTTP uris, but not impose a requirement for one at the expense of the other, and provide specific advice for the issuing and use of lsids and for HTTP uris, • Support a promotional programme, • Demonstrate good practice in its data portal, • Assist providers that are not currently maintaining their own persistent identifiers to do so: this includes both education and technology, • Make data more inter-connected, • Start a programme to become an RDF consumer and encourage data providers to deploy RDF services, • Provide services to support identifier resolution, redirection, metadata hosting, and caching, • Provide additional services, including persistent identifier monitoring services, • Extend the role of its data portal by hosting resources related to the use of identifiers, such as the TDWG vocabularies, • Assist with the availability of software for data and service providers, and • Continue to be funded to provide support to data providers for the foreseeable future.
Response of the GBIF Science Committee • The SC reviewed and endorsed the report of the LSID GUID TG (LGTG). • The SC recommends that • An additional full case study is developed in the document to highlight the new quality control mechanisms that can be established to have users report and receive feedback on the quality of data being served. • Additionally, • the LGTG makes an excellent “obligatory reading material” for the Biodiversity Informatics community in general and for GBIF Participants, in particular. • The SC strongly recommends all participants to read it and be aware of the impact that the implementation of tools such as IPT and GBRDS will have in their local contexts as well as globally
How to contact GBIF: • Web site: www.gbif.org • Data portal: data.gbif.org • GBIF Secretariat • Universitetsparken 152100 CopenhagenDenmark • E-mail: info@gbif.org • Phone: +45 3532 1470 • Fax: +45 3532 1480