260 likes | 367 Views
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame.“ -- Tales of 1001 Nights. Outline. Querying Levels Sesame’s Architecture
E N D
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame.“ --Tales of 1001 Nights
Outline • Querying Levels • Sesame’s Architecture • Sesame’s Modules • Storing RDF Data in RDBMSs
Querying Levels • RDF documents can be considered at three different levels of abstraction: • At the syntactic level they are XML documents. • At the structure level they consist of a set of triples. • At the semantic level they constitute one or more graphs with partially predefined semantics. • Querying at what level is the best?
Querying at the Syntactic Level • In this level we just have an XML document. • So we can Query RDF using an XML query language. (e.g. XQuery) • But RDF is not just an XML dialect. • XML: • Has a tree structure data model. • Only nodes are labeled. • RDF: • Has a graph structure data model. • Both edges (properties) and nodes (subjects/objects) are labeled. • Different ways of encoding the same information in XML are possible.
Querying at the Structure Level • In this level RDF document represents a set of triples: • (type, Book, Class) • (subClassOf, FamousWriter, Writer) • (hasWritten, twain/mark, ISBN00023423442) • (type, twain/mark, FamousWriter) • Advantage: Independent of the specific XML syntax. • A successful query: • SELECT ?x FROM … WHERE (type ?x FamousWriter) • An unsuccessful query: • SELECT ?x FROM … WHERE (type ?x Writer)
Querying at the Semantic Level • We need a query language that is sensitive to the RDF Schema primitives: • e.g. Class, subClassOf, Property, … • RQL • RDF Query Language • The first proposal for a declarative query language for RDF and RDF Schema. • Adopts the syntax of OQL. • Output of queries is again legal RDF schema code, which can be used as input of another query. • A sample query: • SELECT Y FROM FamousWriter {X}. hasWritten {Y}
Sesame’s History • The European On-To-Knowledge project kicked off in Feb. 2000: • This project aims at developing ontology-driven knowledge management tools. • In this project Sesame fulfills the role of storage and retrieval middleware for ontologies and metadata expressed in RDF and RDF Schema.
On-To-Knowledge Project • Sesame is positioned as a central tool in this project. • OntoExtract: extracts ontological conceptual structures from natural-language documents. • OntoEdit: An ontology editor. • RDF Ferret: A user front-end, that provides search and query. RDF Ferret Sesame OntoEdit OntoExtract
What is Sesame? • Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. • It can be used as: • Standalone Server: A database for RDF and RDF Schema. • Java Library: For applications that need to work with RDF internally.
Sesame’s Architecture HTTP SOAP HTTP Protocol Handler SOAP Protocol Handler Sesame Admin Module Query Module Export Module Repository Abstraction Layer (RAL) Repository
The Repository • DBMSs • Currently, Sesame is able to use • PostgreSQL • MySQL • Oracle (9i or newer) • SQL Server • Existing RDF stores • RDF flat files • RDF network services • Using multiple sesame server to retrieve results for queries. • This opens up the possibility of a highly distributed architecture for RDF(S) storing and querying.
Repository Abstraction Layer (RAL) • RAL offers stable, high-level interface for talking to repositories. • It is defined by an API that offers these functionalities: • Add data • Retrieve data • Delete data • Data is returned in streams. (Scalability) • Only small amount of data is kept in memory. • Suitable for use in highly constrained environments such as portable devices. • Caching data (Performance) • E.g. caching RDF schema data which is needed very frequently.
Admin Module • Allows incrementally inserting or deleting RDF data in/from repository. • Retrieves its information form an RDF(S) source • Parses it using an RDF parser • Checks each (S, P, O) statement it gets from the parser for consistency with the information already present in the repository and infers implied information if necessary for instance: • If P equals type, it infers that O must be a class. • If P equals subClassOf, it infers that S and O must be classes. • If P equals subPropertyOf, then it infers that both S and O must be properties. • If P equals domain or range, then it infers that S must be a property and O must be a class.
Query Module • Evaluates RQL queries posed by the user • It is independent of the underlying repository. So it can not use optimizations and query evaluations offered by specific DBMSs. • RQL queries are translated into a set of calls to the RAL. • e.g. when a query contains a join operation over two subqueries, each of the subqueries is evaluated, and the join operation is then executed by the query engine on the results.
RDF Export Module • This module allows for the extraction of the complete schema and/or data from a model in RDF format. • It supplies the basis for using Sesame with other RDF tools.
Important Features of Sesame • Powerful query language • Portability • It is written completely in Java. • Repository independence • Extensibility • Other functional modules can be created and be plugged in it. • Flexible communication by using protocol handlers • The architecture separates the communication details from the actual functionality through the use of protocol handlers.
SeRQL (Sesame RDF Query Language) • It combined the best features of other query languages: RQL, RDQL, N-Triples, N3 • Some of the built-in predicates: • {X} serql:directSubClassOf {Y} • {X} serql:directSubPropertyOf {Y} • {X} serql:directType {Y}
Using PostgreSQL as Repository • PostgreSQL is an open-source object-relational DBMS. • It supports subtable relations between its tables. • Subtable relations are also transitive. • These relations can be used to model the subsumption reasoning of RDF schema.
Example RDF Schema & Data domain range Writer hasWritten Book subClassOf FamousWriter Schema Data type type hasWritten …/twain/mark …/ISBN00023423442
Storing Schema (in an RDBMS) Class SubClassOf SubPropertyOf Property Domain Range
Storing Data (PostgreSQL) Resource Book Writer FamousWriter hasWritten In order to decrease the database size another table, called resources, is added to database which maps resource descriptions to unique IDs.
RDF Schema Ambiguities • There are many ambiguities in RDFS: • RDF Schema is defined in natural language. • No formal description of its semantic is given. • E.g. about subClassOf it only says that it is a property with class as its domain and range. • RDF Schema is self-describing: • The definition of its terms is itself done in RDF schema. • As a result it consists some inconsistencies. • Circular dependencies in terms definitions: • Class is both a subclass of and an instance of Resource. • Resource is an instance of Class.
Scalability Issues • An experiment using Sesame: • Uploading and querying a collection of nouns from Wordnet(http://www.semanticweb.org/library) • Consisting of about 400,000 RDF statements. • Using a desktop computer (Sun UltraSPARC 5 workstation, 256MB RAM) • Uploading the Wordnet nouns took 94 minutes. • Querying was quite slow. • Because data is distributed over multiple tables, and retrieving data needs doing many joins on tables.
References • User Guide for Sesame • http://openrdf.org/doc/users/userguide.html • Broekstra J., Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, ISWC2002 • http://sesame.aidministrator.nl • http://www.openRDF.org