200 likes | 457 Views
RDF storages and indexes. Enterprise Integration – Semantic Web. Maciej Janik. September 1, 2005. Outline. RDF storages Jena Sesame Redland Brahms Indexing RDF difference from DB indexing what to index examples of index types. Storages. Jena Implemented in Java
E N D
RDF storages and indexes Enterprise Integration – Semantic Web Maciej Janik September 1, 2005
Outline • RDF storages • Jena • Sesame • Redland • Brahms • Indexing RDF • difference from DB indexing • what to index • examples of index types
Storages • Jena • Implemented in Java • Supports RDF, RDFS and OWL • In memory and persistent storage (Oracle, MySQL, PostgreSQL) • RDQL • Reasoning/inference engine • Optimization for common statement patterns -grouping of properties • Powerful, but slow and memory exhaustive
Storages • Sesame • Implemented in Java • Modules (HTTP/SOAP handler, admin, query, export, Repository Abstraction Layer) • Persistent RDF store • traditional DBMS or dedicated RDF triple storage • Database independent • Scalable architecture • Node-centric approach • Fast and efficient, as for Java implementation
Storages • Redland – together with Rasqual and Raptor • Modular approach • Redland – only storage for RDF triples + low level API • Implemented in pure C for portability • Rich API and bindings to other languages • Rasqual - RDF query module (RDQL, SPARQL) • Raptor - a very fast RDF parser • Average performance
Storages • Brahms /from LSDIS lab/ • Read-only main-memory storage for RDF • read RDF and saves optimized snapshot • Written in C++, optimized for speed • additional bindings to Java • Full indexing of Subject-Predicate-Object • Uses Raptor as RDF parser • Rich low level API for graph manipulation • Very fast and memory efficient • Waiting for SPARQL implementation
Brahms • Separation of different resource types: • InstanceNode, Literal, SchemaClass, SchemaProperty • Statements • InstanceStatament (instance – property – instance) • LiteralStatement (instance – property – literal) • TypeOfStatement (instance – type – class) • Taxonomy for classes and properties • Iterators deal only with one type of resource • not wasting time during instance search algorithm to check for literal or type relation
Indexing of RDF • RDF = Graph • traditional DB indexes may not be sufficient • XML cannot be indexed directly as relational DB • Indexing may take advantage of tree structure • depth of node • common path from the root • convert each path to string expression • precalculate the path tree • Simple indexes on statements may also be powerful
Brahms Redland What to index? • Most straight-forward approach Statements : subject –[predicate] object • Possibilities: Single: SPO SOP OSP OPS PSO POS Double: SOP SPO POS
Power of single indexes • Full indexing of statements • SPO, SOP, PSO, POS, OSP, OPS • indexes for each type of statements (InstanceStatements, LiteralStatements ...) • fast check if given resrouce is connected to another, or uses given property – use of binary search • merge of 2-hop path element in linear time • All RDF storages are based on simple indexes and their extensions
Schema Vs. Instances [Brahms] • Schema is small compared to instances • Instance to taxonomy • know or check for type of the instance • Taxonomy index (classes and properties) • direct subtypes/supertypes • all ancesstors/descendants • dynamically build index of instances for given type and all its subtypes
Tree-based index • Idea is based on Patricia’s trie • Index should scale with the growth of data • Path together with leaf is encoded into string -> the Index Fabric „A Fast Index for Semistructured Data” - Brian F. Cooper et al.
Index fabrics • Index is used to accelerate path expressions - mainly for queries that ask for root-to-leaf path • Idea of prefix encoding • xml: <A>alpha<B>beta<C>gamma</C></B></A> • paths: <A>alpha ; <A><B>beta ; <A><B><C>gamma • encoded: A alpha ; A B beta ; A B C gamma • infix (not common): A alpha B beta C gamma • Convert path to string for fast searches • Replace tags with ‘non-terminal’ characters (like in automata)
Indexing of graphs http://www.aisee.com/ Backbone
Indexing of graphs http://www.aisee.com/ Tree-type - prefixes - tries
2-index 1-index Indexing of graphs T-index Path templates „Index Structure for Path Expressions” - Tova Milo, Dan Suciu
Indexing of graphs http://www.aisee.com/ Landmarks
Indexing of graphs • Indexing semistructured data • index fabric - encoding, multilayered • common prefixes - trie structure • backbone - highways between points • landmarks - county division • path templates - precalculated expressions • clustering - grouping by theme access • Indexing such data is NOT easy, solution depends how you want to search the graph
References • Beckett, D., „The Design and Implementation of the Redland RDF Application Framework”. • Cooper et al., „A Fast Index for Semistructured Data” • Janik M. And Kochut K., „BRAHMS: A WorkBench RDF Store And HighPerformance Memory System for Semantic AssociationDiscovery” • Milo T. and Suciu D., „Index Structures for Path Expressions” • Wilkinson et al., „Efficient RDF Storage and Retrieval in Jena2” • Jena - http://jena.sourceforge.net/ • Raptor - http://librdf.org/raptor/ • Redland – http://librdf.org/ • Sesame - http://www.openrdf.org/