150 likes | 281 Views
Triplestore Experiences. Nathan Wilhelmi 11/27/2012 NCAR - CISL/TDD/VETS. Our Experiences…. Disclaimers: Did not have an ontologist Codebase passed through multiple developers Timelines ( changing landscape) Started work 2006 Stopped active development ~2011 Sesame version 2.3.0.
E N D
Triplestore Experiences Nathan Wilhelmi 11/27/2012 NCAR - CISL/TDD/VETS
Our Experiences… • Disclaimers: • Did not have an ontologist • Codebase passed through multiple developers • Timelines (changing landscape) • Started work 2006 • Stopped active development ~2011 • Sesame version 2.3.0
Why a Triplestore? • Search functionality • Faceted • Free text • Model metadata • Metadata storage • Display • Semantic web
Initial Architecture • Authoritative metadata source was RDBMS • Metadata harvested into the triplestore at periodic intervals • Triplestore only contained metadata to drive search • Sesame used as a stand alone service
Sesame Triplestore • Standalone Sesame server • Stability problems • No security, triplestore could be updated by anyone • Changed to in-memory store • Stable • Picked up performance improvements • Embedded triplestore was only internally referencing • RDF didn’t work outside of the application • Distilled to key-value store
Internal Referencing <rdf:RDF ...> <rdf:Descriptionrdf:about="http://www.earthsystemgrid.org/esg.owl#esg-ncar__ucar_cgd_ccsm_b30_072b"> .... <esg:hasUnconfiguredModelComponentrdf:resource="http://www.earthsystemgrid.org/esg.owl#modelcomponent_ccsm_run_b30.072b" /> .... </rdf:Description> </rdf:RDF>
Performance • For our query patterns were not seeing needed performance • Inferencing was removed and performance improved to acceptable levels for <5k datasets • Target volume 50K datasets • Sparql missing key operators: ordering, limits
Tooling Support • Managing the triplestore • Protégé round trips didn’t work well • Dump full triple store to XML and grep by hand • Deleting and updating triples • Deletes were difficult, dangling triples • Rebuild from authoritative sources
Implementation Issues • Schema-less design was perceived as faster • Rapid ontology changes during development • Still needed data migration tools • Modeling the problem domain • Modeled a triplestore, not the domain • Very tightly coupled code was difficult to maintain and replace • Steep learning curve for new developers
URIs Are Foundational • Properly encoding URIs • Created unencoded URIs within the triplestore • Queries were created with string concentration • Lead to broken queries and data • Generated instance URIs through a lossy algorithm to get around encoding • Could only relate from source -> triple store
Our Current Path Forward • Using SOLR Search • Fantastic search tool! • Metadata in RDBMS • Working well • Effective tools, including schema migration • Scales very well for our metadata • Still needed to expose RDF metadata…
RDF with RDBMS <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sesame="http://www.openrdf.org/schema/sesame#" xmlns:esg="http://www.earthsystemgrid.org/esg.owl#"> <rdf:Descriptionrdf:about="http://www.earthsystemgrid.org/esg.owl#${rdfIdFactory.getDatasetId(dataset)}"> <rdf:typerdf:resource="http://www.earthsystemgrid.org/esg.owl#Resource" /> <rdf:typerdf:resource="http://www.earthsystemgrid.org/esg.owl#Dataset" /> <rdf:typerdf:resource="http://www.earthsystemgrid.org/esg.owl#GeophysicalDataset" /> <rdf:typerdf:resource="http://www.earthsystemgrid.org/esg.owl#ModelDataset" /> <esg:hasUrirdf:datatype="http://www.w3.org/2001/XMLSchema#string"> resource://${gateway.name?upper_case}#${dataset.persistentIdentifier} </esg:hasUri> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">${dataset.name}</rdfs:label> </rdf:Description> </rdf:RDF>
Looking Forward • Storing metadata • Content managementsystems? • NoSql storage options? • Modeling complicated relationships • Neo4J looks promising…
Questions / Discussion • NathanWilhelmi • wilhelmi@ucar.edu