RDF stores- A walkthrough (SDB, TDB, Allegrograph , etc)

RDF stores- A walkthrough (SDB, TDB, Allegrograph, etc) -By Mohamed Ershad Junaid UTD ID : 2021041902 Email : ershadj@student.utdallas.edu

RDF STORES • There are a number of tools that store RDF data in their own optimized schema. In this presentation we will be going through most of the tools used as RDF Stores , namely • 1.SDB • 2.ALLEGROGRAPH • 3.SESAME • 4.SwiftOWLIM • 5.BigOWLIM

SDB – A SPARQL DATABASE FOR JENA • SDB is a component of Jena. It provides for scalable storage and query of RDF datasets using conventional SQL databases for use in standalone applications, J2EE and other application frameworks. The database tools for load balancing, security, clustering, backup and administration can all be used to manage the installation. SDB is designed specifically to support SPARQL, the query language developed by the W3C RDF Data Access Working Group.

Use of an SDB store requires a Store object which is described in 2 parts: • a connection to the database • a description of the store configuration • Store objects themselves are lightweight so connections to an SDB database can be created on a per-request basis as required for use in J2EE application servers • Store Description : A store description identifies which storage layout is being used, the connection to use and the database type. [] rdf:typesdb:Store ; sdb:layout "layout2" ; sdb:connection <#conn> . <#conn> .. • SDB connections, objects of class SDBConnection, abstract away from the details of the connection and also provide consist logging and transaction operations • SDB Store and Connections involve choosing the Store type by setting the sdbType needed for the connection. • SDB has DataSets which are handled by the assembler based on the description, like for example to assemble a particular model in a store a description is given as : # Default graph <#myModel1> rdf:typesdb:Model ; sdb:dataset <#dataset> . # Named graph <#myModel2> rdf:typesdb:Model ; sdb:namedGraph data:graph1 ; sdb:dataset <#dataset> .

SDB does not have a single database layout. But of those that are fixed and available we view the two main types. • In SDB one store is one RDF dataset is one SQL database. • Databases of type layout2 have a triples table for the default graph, a quads table for the named graphs. In the triples and quads tables, the columns are integers referencing a nodes table. • In the hash form, the integers are 8-byte hashes of the node. • In the index form, the integers are 4-byte sequence ids into the node table. • Triples • +-----------+ • | S | P | O | • +-----------+ • Primary key: SPO • Indexes: PO, OS • Quads • +---------------+ • | G | S | P | O | • +---------------+ • Primary key: GSPO • Indexes: GPO, GOS, SPO, OS, PO.

ALLEGROGRAPH • AllegroGraph RDFStore is a modern, high-performance, persistent RDF graph database. AllegroGraph uses disk-based storage, enabling it to scale to billions of triples while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from Java applications

How the Logical Store Works • In RDF-land, an assertion is a statement thatsubject predicate object (in the context of graph) • The bulk of an AllegroGraph triple-store is composed of assertions. Though called triples for historical reasons, each assertion has five fields: • subject (s) , predicate (p), object (o), graph (g), triple-id (i) . • All of s, p, o, and g are strings of arbitrary size. Of course, it would be very inefficient to store all of the duplicated strings directly so we associate a special number (called a Unique Part Identifier or UPI) with each unique string. The string dictionary manages these strings and UPIs and prevents duplication. • To speed queries, AllegroGraph creates indices which contain the assertions plus additional information. • AllegroGraph can also perform freetext searching in the assertions using its freetext indices. • and Finally, AllegroGraph keeps track of deleted triples

ALLEGROGRAPH’S INTERNAL ARCHITECTURE

SESAME • Sesame is an open source RDF framework with support for RDF Schema inferencing and querying. Originally, it was developed by Aduna (then known as Aidministrator) as a research prototype for the EU research project On-To-Knowledge. Now, it is further developed and maintained by Aduna in cooperation with NLnet Foundation, developers from Ontotext, and a number of volunteer developers who contribute ideas, bug reports and fixes. • Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. For example, suppose you need to read a big RDF file, find the relevant information for your application, and use that information. Sesame provides you with the necessary tools to parse, interpret, query and store all this information, embedded in your own application if you want, or, if you prefer, in a separate database or even on a remote server

Sesame supports RDF Schema inferencing. This means that given a set of RDF and/or RDF Schema, Sesame can find the implicit information in the data. Sesame supports this by simply adding all implicit information to the repository as well when data is being added • Some of SeRQL's most important features are: • Graph transformation. • RDF Schema support. • XML Schema datatype support. • Expressive path expression syntax. • Optional path matching. • URIs and literals are the basic building blocks of RDF. For a query language like SeRQL, variables are added to this list. The following sections will show how to write these down in SeRQL

SESAME API

OWLIM is a high performance semantic repository, implemented in Java and packaged as a Storage and Interface Layer (SAIL) for the Sesame RDF database. OWLIM is based on TRREE – a native RDF rule entailment engine. (TREE – Triple Reasoning Rule Entailment Engine) SwiftOWLIM is an OWLIM that has its reasoning and query evaluation performed in memory, while at the same time, data preservation, consistency and integrity are guaranteed

OWL Layering and Variations

We make use of Axioms, Prefices and Rules. • All the above constitute the rule language • TRREE stands for Triple Reasoning and Rule Entailment Engine that is configured via Rule-Sets. • Syntax : Axioms { • //RDF axiomatic triples • }

BIGOWLIM • BigOWLIM is a high-performance semantic repository with support for OWL reasoning and rule extensions. BigOWLIM uses the TRREE engine to perform RDFS, OWL DLP, and OWL Horst reasoning, based on forward-chaining of entailment rules. The reasoning support can be customized through rulesets. There are four pre-defined rulesets, the most expressive of which supports a proper extension of RDFS with almost full OWL Lite.

BigOWLIM is a specific configuration for the Sesame RDF database and counts on it for various sorts of features and infrastructure, including, but not limited to, an extensive set of RDF and query language parsers. • BigOWLIM is packaged as a Storage and Inference Layer (SAIL) for Sesame named BigOwlimSchemaRepository; it implements the RdfSchemaRepository • In contrast to SwiftOWLIM (the “standard” in-memory version), BigOWLIM performs reasoning • and query evaluation directly against the permanent image of the repository. • Java library available under a commercial license from Ontotext Lab. • In BigOWLIM, reasoning and query evaluation are performed over a storage based on binary files. The reasoning strategy is total materialization. • The efficiency of TRREE allows BigOWLIM to manage billions of explicit statements on server hardware. • BigOWLIM is relatively slow delete operation – a limitation typical for the OLAP databases. The upload, storage, inference, and query evaluation are fast even for huge ontologies and knowledge bases.

REFERENCES • http://www.franz.com/agraph/allegrograph • http://jena.hpl.hp.com/wiki/SDB • http://www.openrdf.org/doc/sesame/ • www.ontotext.com/owlim/ • www.ontotext.com/owlim/big/

THANK YOU FOR LISTENING PATIENTLY

RDF stores- A walkthrough (SDB, TDB, Allegrograph , etc)