280 likes | 394 Views
Integrating Live Plant Images with Other Types of Biodiversity Records. Steve Baskauf Vanderbilt Dept. of Biological Sciences http://bioimages.vanderbilt.edu/ August 3, 2010. I. Challenges in Biodiversity Informatics. Common interest in databasing metadata .
E N D
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences http://bioimages.vanderbilt.edu/ August 3, 2010
I. Challenges in Biodiversity Informatics • Common interest in databasing metadata. • Metadata describe resources and their properties. • Resource: anything that can be assigned an identifier (e.g. a tree, a specimen, an image, a taxon, a name, etc.) • Property: a string literal that describes the resource or a relationship between the subject resource and some other resource.
Example: Vanderbilt Arboretum 5935 identified and geolocated trees
Example “native” establishmentMeans text string literal property depiction relationship property object resource (an image) subject resource (a tree)
Relationship “graph” “native” establishmentMeans the tree (7-314) image (79657) depiction Traditional database (typical for specimens)
Non-“flat” relationships in live-plant imaging taxon determination live tree whole tree image leaf image standardized views Baskauf and Kirchoff (2008) Vulpina 7:16-30 bark image
Duplicate herbarium specimens taxon A taxon B determination A determination B live tree live tree same individual herbarium specimen at institution A duplicate herbarium specimen at institution B specimen image
taxon A taxon B determination A determination B live tree (individual organism) herbarium specimen whole tree image bark image leaf image specimen image Complex relationships individual-based organization system Baskauf (2010) Biodiversity Informatics 7:17-44
II. Building blocks of a Web-based metadata system • We need to be able to unambiguously identify the resources (globally unique identifiers =GUIDs) • We need standardized property definitions (e.g. Darwin Core terms) • We need a technological solution for communicating properties and relationships to a user anywhere (RDF/XML representation sent to user via the Internet) design principles http://bioimages.vanderbilt.edu/guid
Building block #1: GUIDs A globally unique identifier (GUID) should be: • globally unique • actionable • persistent Anyone on the planet should be able to use the GUID to find out about the particular thing that it identifies, forever. That is a pretty tall order (but you can do it)!!!
1. How do you make an identifier globally unique? • Create a locally unique identifier: • identifier (catalog number) unique within a collection, e.g. GIS tree ID number: 7-314 • namespace (collection code) unique within the institution, e.g. vanderbilt vanderbilt/7-314 • Make it globally unique by appending a domain name that you control, e.g. bioimages.vanderbilt.edu
Complete HTTP URI GUID • combine “http://” with other pieces: http://bioimages.vanderbilt.edu/vanderbilt/7-314 • This identifier looks like a URL! An HTTP URI is a uniform resource identifier as well as a resource locator (web address=URL).
2. What does actionable mean? • Something happens when you put an actionable GUID in a Web browser (GUID is “resolved”). • HTTP URIs • unlike LSIDs and DIOs, they work in any web browser • resolved using existing Internet infrastructure • consensus GUID of Linked Data (Semantic Web) community • http://bioimages.vanderbilt.edu/vanderbilt/7-314
3. Persistent URIs always work • URIs “break”: when filenames change: Javascript based URI: http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/3456/2304 Independent of method: http://bioimages.vanderbilt.edu/baskauf/66921.htm Both URIs eventually lead to the same page, but the second URI is simpler and won’t change. • URIs “break”: when domain names disappear bioblitznashville.org vs. vanderbilt.edu • Planning for URI permanence is important.
How long is “persistent”? • Forever is a pretty long time. • The Internet is only 40 years old and the Web only 20. • Plan for your institution and domain name to last at least 10 years. • Don’t change the URI of anything that you are trying to identify!
Building block #2: Standardized property definitions Recent consensus on metadata terms: • Dublin Core Metadata Initiative (DCMI) = describes generic resources • Friend-Of-A-Friend (FOAF) = describes people and their affiliations • Darwin Core (DwC) = describes biodiversity resources • Media Resources Task Group (MRTG) = describes media (e.g. images) in a biodiversity context
A property described by a metadata term: • is an HTTP URI, e.g. http://rs.tdwg.org/dwc/terms/establishmentMeans • has a definition that can be accessed via the Internet • has an abbreviated form that usually makes sense to humans dwc: = http://rs.tdwg.org/dwc/terms/ so the abbreviated URI for the term is dwc:establishmentMeans
Building block #3: Communicating relationships “native” establishmentMeans depiction Resource Description Framework (RDF) graph native http://bioimages.vanderbilt.edu/vanderbilt/7-314 dwc:establishmentMeans subject resource (tree) foaf:depiction http://bioimages.vanderbilt.edu/baskauf/79657 object resource (image)
How do you translate relationships into language a computer can understand? Resource Description Framework (RDF) graph native http://bioimages.vanderbilt.edu/vanderbilt/7-314 dwc:establishmentMeans foaf:depiction http://bioimages.vanderbilt.edu/baskauf/79657 RDF in XML format (a tiny snippet) <rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"> <dwc:establishmentMeans>native</dwc:establishmentMeans> <foaf:depiction rdf:resource="http://bioimages.vanderbilt.edu/baskauf/79657"/> </rdf:Description>
III. Why use a new way to describe metadata? • People are good at figuring out what web pages mean. • Computers (like a GoogleBot) have to guess what the information on a web page means. • The Semantic Web (a.k.a. Web 2.0) provides a means to provide information to computers explicitly.
Content Negotiation, part 1 “I am a human. Send me http://bioimages.vanderbilt.edu/vanderbilt/7-314” I cannot send this guy a tree! GET http://bioimages.vanderbilt.edu/vanderbilt/7-314 MIME type: text/html Web page web server http://bioimages.vanderbilt.edu/vanderbilt/7-314.htm
Content Negotiation, part 2 “I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/7-314” 10011000101! GET http://bioimages.vanderbilt.edu/vanderbilt/7-314 MIME type: application/rdf+xml XML file web server http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf
What’s so great about this? • A computer can crawl the Web and discover metadata about resources that are identified by HTTP URI GUIDs. • RDF metadata from many sources can be assembled into a database (RDF “triple store”). • The database can be searched or used to generate web content. • Source data does not need to be “sent” to the database; any “semantic web client” can retrieve it at will. • The format is standard, no special communication protocols are required.
Why would this benefit me now? • RDF/XML metadata files for numerous resources can be transformed directly into web pages using a single program file. single web page using XSLT and/or AJAX
Benefits (cont.) • Branding in the URI. http://bioimages.vanderbilt.edu/vanderbilt/7-314
Benefits (cont.) • HTTP URI GUIDs provide direct access to metadata about a resource to anyone with Internet access. • Clickable attribution link in website • Reference link in publication PDF • Physical QR codes for Smart Phone access