450 likes | 554 Views
Steven Perry Dave Vieglais. Overview. WASABI is a framework for building scientific data networks based on RDF, OWL, and open data access protocols. Objective. Build a data access network that… Can handle many types objects Is resilient to changes in data models
E N D
Overview WASABI is a framework for building scientific data networks based on RDF, OWL, and open data access protocols.
Objective Build a data access network that… • Can handle many types objects • Is resilient to changes in data models • Refers to objects with GUIDs • Allows fast & efficient searches • Allows incremental harvesting • Simplifies creation of client software
RDF and OWL RDF described by OWL allows… • Machine readable controlled vocabularies • Distinction between classes and properties • Data objects as resources identified with globally unique LSIDs • Query languages to examine patterns of relationships between objects
Framework Components Provides access to RDF data sets through multiple protocols
Framework Components Libraries for building client applications Provides access to RDF data sets through multiple protocols
Framework Components Web-based client for accessing data on a wasabi network Libraries for building client applications Provides access to RDF data sets through multiple protocols
Wasabi Server Server • Stores a cached copy of source data in RDF format called a data set • Each data set is bound to one or more protocols handlers • Standard protocols include OAI, SimpleLSID, and SPARQL
Loading Data Loading RDF Data • RDF data can be loaded from one or more files directly into Wasabi • Wasabi will not assign new LSIDs • Wasabi checks to see if any data objects are new or have changed and can scan for deleted data objects
Loading Data Loading Non-RDF Data • Wasabi uses a synchronizer program to generate RDF from SQL output or delimited files • Synch program must know about your source data format • Wasabi can assign LSIDs if needed • Wasabi checks to see if any data objects are new or have changed and can scan for deleted data objects
OAI-PMH Open Archive Initiative Protocol for Metadata Harvesting • Wasabi implementation allows efficient harvesting • Supports incremental harvesting “What objects have changed since Oct-02-2006?” • Notifies clients about deletions
LSID Resolution Life Science Identifier Metadata Resolution • Wasabi supports a simple HTTP-GET LSID metadata resolution service • Supports metadata resolution “What is the RDF metadata for urn:lsid:auth.org:ns:23?” • Compliant LSID resolution through plug-in for IBM LSID resolver.
SPARQL SPARQL Protocol • SPARQL is the W3C candidate for querying RDF • SPARQL protocol bound to HTTP-GET • ASK and SELECT queries return SPARQL XML results • DESCRIBE and CONSTRUCT queries return RDF/XML results
SPARQL SPARQL Query Language Example • “What is urn:lsid:auth.org:person:3424?” DESCRIBE <urn:lsid:auth.org:person:3424> <rdf:RDF xmlns:j.0=“http://tdwg.org/onto/bdi/person.owl#” xmlns:rdf:”http://www.w3.org/1999/02/22-rdf-syntax-ns#”> <rdf:type resource=“http://tdwg.org/onto/bdi/person.owl#Person”/> <j.0:givenName>Steven</j.0:givenName> <j.0:familyName>Perry</j.0:familyName> </rdf:RDF>
SPARQL SPARQL Query Language Example • “What is the genus of the specimen urn:lsid:auth.org:spec:657?” SELECT ?genus WHERE { <urn:lsid:auth.org:spec:657> <spec:identifiedAs> ?txname ?txname <tn:rank> <tn:Genus> ?txname <tn:uninomial> ?genus } ?genus = “Heteractis”
Wasabi Server OAI, SPARQL, and LSID are standard protocols, so Wasabi services can be used by non-Wasabi clients.
Wasabi Client Library Client Library • Contains implementations of clients for protocols used by Wasabi • Can be included in projects that need to communicate with Wasabi servers • Programmatic access to services (hides XML messaging layer) • Provides status and progress listeners • Can be used to query non-Wasabi implementations of OAI or SPARQL
Wasabi Indexer Indexer • Harvests from 1 or more RDF sources • Sources can be Wasabi servers (via OAI) sets of RDF files, etc. • Multiple types of indices can be fed from a single set of descriptions • Indexers can filter by object type, etc. • Indexers should understand incremental updates and deletions
Wasabi Portal Portal • Customizable human interface that allows access to 1 or more Wasabi servers • Default portal requires a Lucene index of harvested data. Most portal queries are against the index • To retrieve and display data objects, the portal makes repeated LSID resolution calls so servers can log access
Wasabi Portal Portal • Portal automatically configures search forms and renderers based on downloaded OWL ontologies • Provides simple search, advanced search, ontology browsing, and export of downloaded data to CSV or RDF files
Implementation • http://wasabi.ecoforge.net • Java 1.5 with Spring, Jena, Lucene, and more • Server requires servlet container (Tomcat, WebLogic, etc.) • Server requires JDBC database (MySQL, PostgreSQL, etc.)
Current State • Server, Client Library and Indexer components are feature complete • Portal is still under development • Using experimental OWL data models; awaiting TDWG ontology.
Future Plans • Complete portal • Construct the FishNet2 network (25+ servers) • Construct the PlantCollections network (15+ servers)
Conclusion WASABI is a framework for building scientific data networks based on RDF, OWL, and open data access protocols.
Conclusion • RDF allows us to share complex data models • OWL allows machines to understand the data models and provides opportunities for extending models over time • Standard protocols (OAI, LSID, & SPARQL) allow for integration across data networks and with the semantic web
Support Development of Wasabi is supported by the National Science Foundation as part of the Integrated Community Infrastructure (ICI) project.