210 likes | 415 Views
IDs in and out of the database. Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi. Overview. What good is identification? How are identifiers used by consumers Providing IDs Resolving IDs in a server
E N D
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi
Overview • What good is identification? • How are identifiers used by consumers • Providing IDs • Resolving IDs in a server • Strategies for storing IDs in databases • Linked Data • Annotations ~ all sorts • Feedback
What good is identification? • Aggregation • If you get info from 2 sources that are about the same object, you can combine the info • Resolution (finding information about object) • Types of resolution • Determine where to get information • Determine how to get information • Providing information • How to create IDs • How to publish IDs • How to fetch database information for IDs
HTTP URIs • Biggest problem • Identification and 2 types of resolution are comingled • Resolution • Where to get information • Look somewhere • How to get information • Fetch information using some protocol
DOI example • The DOI is • 10.3897/zookeys.209.3135 • URI (for aggregating) is • doi:10.3897/zookeys.209.3135 • A URL for information retrieval (proxy resolution) is • http://dx.doi.org/10.3897/zookeys.209.3135 • Information fetched from • HTML: • http://www.pensoft.net/journals/zookeys/article/3135/abstract/five-task-clusters-that-enable-efficient-and-effective-digitization-of-biological-collections • RDF: • http://data.crossref.org/10.3897/zookeys.209.3135
What’s in an ID? • For consumer: • NOTHING! No information • Might as well be UUID • Can’t type it, remember it, parse it, resolve it • Useful for comparison and aggregation • Equal strings (persistence) • Different strings about the same object • fetching information • Send the ID somewhere for info
What’s in an ID? • For Provider/resolver: • Use ID to find local storage of information • E.g. • parse out the DWC triple • Extract the database table and primary key • Look up the ID in a table of IDs • Look up ID in a URI field of a database table
What’s in an id for the provider? • record id 112234 • uuid954c8760-e1a6-4b4b-ab82-6bf7311c25f3 • lsidurn:lsid:example.org:specimen:22545 • an http - uri • ezidhttp://n2t.net/ark:/99999/fk42b9hdf • doidoi:10.1038/ng0609-637
What about Specimen identifiers? • identifier on the specimen? • readable text • encoded data • barcode is a contextual identifier • identifier in the database? • http://ids.usms.edu/herb/0014097 • http://ids.usms.edu/herb/0303134303937
How do providers identify? • Notice online databases and your database and find the identifiers of the various objects • Some identifiers are local (e.g. primary key) • Some identifiers are globally unique • Some identifiers are URIs
Identification in the field • wireless or workbench • data collected and uploaded
Storing IDs in databases • your contextual ids?, your guids? • What to use for IDs? • record id • uuid • lsid • uri • what’s in your wallet database? • Morphbank Example
IDs in Morphbank • Morphbank Example • http://www.morphbank.net/818505
IDs in Morphbank • Morphbank Example • http://www.morphbank.net/643261
Sharing data with IDs • into a publication • uploaded to the web • data shared with a database integrator / aggregator • GBIF • iDigBio • VertNet • Morphbank • what is it exactly in the publication? • an id?, a guid? a link to more information? • what will be cited? searched for?
Feedback with IDs • Annotations • Target of annotation • http://www.morphbank.net/818505 • filtered PUSH • linked data ~ the semantic web • (benefits – in a minute) • updating the database • be(a)ware • Remember previous IDs
What’s coming up next? • expect guids for all sorts of objects • collection objects (example: specimen) • georeferences • taxon concepts • determinations • people
GUIDs are key • 1 to many IDs known for a given object • store and share the ones you know about Specimen RecordID 19537 Specimen Previous Catalog Number 212345 Specimen Catalog Number / bar code bbbrc000123 Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123 DwC Occurrence URI urn:catalog:flmnh:herb:bbbrc000123 Specimen GUID of type lsidurn:lsid:biocol.org:flmnh:bbbrc000123 Specimen Opaque Identifier (UUID) 424854d7-baec-42cf-a142-805b64117b9f URI for UUIDurn:uuid:424854d7-baec-42cf-a142-805b64117b9f Specimen GUID of type HTTP-URIhttp://ids.flmnh.ufl.edu/herb/bbbrc000123 *Cannot enforce single identifier per object
caring for guids • store them • database adjustments • tweaking current standard practices • share them • data standards • 3 ways to modify darwin core • reap the benefits
caring for guids – reap the benefits • Data quality feedback • Dialog based on annotation • Tracking objects through analysis and use • Maintaining attribution to provider • Find related objects • Find a way to take advantage of efforts of many smart dedicated people • BHL, biscicol, filtered PUSH, GNA, TNRS, SGR,…