SW Application Development with the Sesame Framework

Jeen BroekstraAduna SW Application Developmentwith the Sesame Framework Atanas Kiryakov OntoText James Leigh Workbrain

Schedule for the afternoon • 14:00 – 14:10 Introduction • RDF and RDF Schema • The Sesame project: history, contributors, future • A running example for the rest of the afternoon • 14:10 – 14:45 The Sesame Framework • overview of architecture and features • how to use Sesame to build apps • 14:45 – 15:00 Querying RDF with SeRQL / SPARQL • 15:00 – 15:30 Using Context for Provenance and Time tracking • 15:30 – 15:50 Coffee break • 15:50 – 16:35 Elmo • 16:35 – 17:15 OWLIM • 17:15 – 17:30 Discussion / wrapup

Tutorial Materials online • Location:http://openrdf.org/conferences/eswc2006/ • Links to Sesame 2 user, system, api documentation • Query Language documentation • Supportive material for this tutorial: • example queries and an example Sesame server • a CVS repository containing some simple, runnable, code examples • example Elmo applications

Introduction

RDF in one slide Data model for expressing knowledge basic building block: statement <person001> <name> “Jeen” . groups of statements form graphs name Jeen person001 email j.broekstra@tue.nl worksIn projectMemberEmail project001 Sesame name

RDF Schema in one more slide RDF Schema is a Vocabulary Description Language it allows specification of domain vocabulary and a way to structure it Class, Property, subClassOf, subPropertyOf, domain, range Formal semantics add simple reasoning capabilities: class and property subsumption domain and range inference rdfs:Class rdf:type rdf:Property Person rdf:type rdfs:domain rdfs:subClassOf name Researcher rdf:type person001

Turtle Syntax @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix movie: <http://example.org/movies/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix um: <http://example.org/usermodel/> . <urn:foaf:jeenbroekstra> a um:User ; foaf:firstName "Jeen" ; foaf:familyName "Broekstra " ; foaf:mbox <mailto:jeen.broekstra@aduna-software.com> ; rdfs:seeAlso <http://www.openrdf.org/people/foaf-jeen.rdf> . <urn:foaf:jeenbroekstra> um:rating [ a um:ActorRating ; um:onActor <http://www.imdb.com/name/nm0000164/> ; rdf:value "6"^^xsd:integer ] .

The Sesame Project 1999 – 2001 EU IST On-To-Knowledge project development of a research prototype ‘RDF query engine’ for use in the project Aduna developed a query engine (RQL) + RDBMS backend result: Sesame 0.1 2001 – 2003 ‘Open Sesame’ sponsored by the NLNet foundation two-year open source development project end-result: Sesame 1.0 2004-2006 ‘Open Sesame 2’ followup project goal: to increase adoptation of Sesame, and to further develop the framework result: Sesame 2.0 2006 onwards Aduna is committed to further developing Sesame as part of its product suite Dedicated to open source Open invitation for more participation in development: we’d like to have your contributions!

Tutorial example A community system for Movie and Actor ratings user model user profile ratings of actors and movies movie ontology movies with genres, descriptions, etc. actors with foaf profiles

The Ontology (or Ontologies) foaf:Person rdfs:subClassOf rdfs:subClassOf movie:Actor um:Rating um:User rdfs:range rdfs:subClassOf rdfs:range rdfs:domain rdfs:subClassOf um:rating um:ActorRating um:MovieRating rdfs:domain um:onActor rdfs:domain um:onMovie

User/Rating Model um:ActorRating movie:Actor um:User rdf:type rdf:type rdf:type “Johnny” “John” actor1 foaf:firstName foaf:firstName user1 “Depp” foaf:familyName foaf:familyName “Doe” um:onActor um:rating “5”^^xsd:int rdf:value

Movie Model movie:Genre movie:Movie rdf:type rdf:type movie:Romance movie:Comedy rdf:type movie:genre movie:genre movie1 movie:year “1990” movie:Role movie:title movie:hasPart rdf:type “Edward ScissorHands” r1 “Edward ScissorHands” movie:characterName actor1 movie:playedBy

The Sesame Framework

What is Sesame? A framework for storage, querying and inferencing of RDF and RDF Schema A Java Library for handling RDF A Database Server for (remote) accessto repositories of RDF data

Sesame features Light-weight yet powerful Java API Highly expressive query and transformation languages SeRQL, SPARQL High scalability (O(10^7) triples on desktop hardware) Various backends Native Store RDBMS (MySQL, Oracle 10, DB2, PostgreSQL) main memory Reasoning support RDF Schema reasoner OWL DLP (OWLIM) domain reasoning (custom rule engine) Transactional support Context support Rio Toolkit: parsers and writers for different RDF syntaxes: RDF/XML, Turtle, N3, N-Triples

Sesame architecture application HTTP / SPARQL protocol application HTTP Server Repository Access API SeRQL SPARQL SAIL API Rio SAIL Query Model RDF Model

Sesame architecture application Remote apps can communicate overthe Web with a Sesame server and update data or do queries HTTP / SPARQL protocol application HTTP Server Allows deployment of Sesame as a web-enabled database server (e.g. in Tomcat). Implements a superset of SPARQL protocol (HTTP REST) Local apps can just include (parts of) Sesame as a Java library and use it to process RDF data efficiently. Repository Access API Main Access API of SesameOffers developer-friendly methods for manipulating RDF data (query, adding, removing, updating) SeRQL SPARQL Declarative Querying and other ‘higher-level’ functions on SAILs SAIL API Rio SAIL Query Model Storage And Inference Layer System API for ‘wrapping’ storage backend RDF I/O Set of parsers and writers for RDF/XML, Turtle, N3, N-Triples.Can be used separately. RDF Model The core RDF model, containing objects and interfaces for URIs, blank nodes, literals, statements.

The SAIL API Storage And Inferencing Layer Abstraction from physical storage allows other Sesame components to function on any type of store can be used as a wrapper layer for aparticular data source System Internal API application developers typically do not use it directly

The Repository Access API A single Java object representation for a Sesame database, offering methods for evaluating a query and retrieving the result adding RDF data from local file, from the web, as a text string, etc. adding/removing (sets of) RDF statements starting/stopping transactions

Installing a Server Prepare the environment install a Java Servlet Container (we recommend Apache Tomcat 5.x) install a full Java 5.0 environment(we recommend Sun J2SE 1.5.x) deploy the sesame.war web application [TOMCAT]/webapps/sesame configure the Sesame server [SESAME]/WEB-INF/server.conf.example edit as XML file in your favourite editor

A repository configuration • id is the string identifier for the repository • title is a human-readable title • sailstack contains a list of sails (top-to-bottom) • the bottom sail is the actual storage layer • each layered sail adds functionality as a ‘filter’ on top of the store (e.g. inferencing, caching, etc.) <repository id="mem-rdfs"> <title>Main Memory RDF Schema repository</title> <sailstack> <sail class="org.openrdf.sesame.sailimpl.memory.MemoryStoreRDFSInferencer"/> <sail class="org.openrdf.sesame.sailimpl.memory.MemoryStore"> <param name=“file” value=“/data/mem-rdfs.dat”/> </sail> </sailstack> </repository>

Reasoning • Sesame supports RDF Schema entailment • a reasoner is a ‘stacked’ SAIL and applied to a storage backend at configuration time • current implementation computes the RDFS closure at upload-time and stores it • main advantage: • faster querying • main disadvantage: • slower update speed

play along at http://www.openrdf.org/sesame2-webclient Querying with SeRQL / SPARQL

Querying RDF RDF is a labeled, directed graph of semistructured data no rigid schema An RDF query language needs to be able to address this: graph path expressions dealing with semistructured nature of RDF flexible querying of both data and schema

SeRQL vs. SPARQL • Both: expressive query and transformation language for RDF • SELECT and CONSTRUCT • optional path expressions • support for context/named graphs • SeRQL (“circle”) • nested queries (IN, EXISTS operators) • user-friendly syntax (a matter of taste of course) • efficient Sesame implementation • SPARQL (“sparkle”) • W3C Standard (in progress) • tool interoperability: Jena, Redland, 3Store, Sesame, …

SeRQL vs. SPARQL example SELECT DISTINCT X, T FROM {X} movie:title {T}; movie:hasPart {Y} movie:characterName {Z} WHERE Z = “Edward Scissorhands”@en USING NAMESPACE movie = <http://example.org/movies/> PREFIX movie: <http://example.org/movies/> SELECT DISTINCT ?x ?t WHERE { ?x movie:title ?t ; movie:hasPart ?y . ?y movie:characterName ?z . FILTER (?z = “Edward Scissorhands”@en) }

SeRQL path expressions {X} movie:hasPart {:role1} {X} movie:hasPart {Y} {X} P {Y} movie1 movie:hasPart role1 movie:characterName “Edward ScissorHands”

Chaining, branching and comparing Chaining: {X} movie:hasPart {Y} movie:characterName {Z} Branching: {Y} rdf:type {movie:Role}; movie:characterName {Z} Comparison operators: String comparison: Z like “*Hands” boolean comparison: X < Y, X <= Y, Z < 20, Z = Y, etc. movie1 movie:hasPart role1 movie:characterName “Edward Scissorhands”

SeRQL query composition Using the building blocks, we can compose complex queries. SeRQL uses a select-from-where syntax (like SQL): select: the variables that you want to return from: the path in the graph that you want to get the information from where: additional constraints on the values using operators SELECT X, Y, Z FROM {X} movie:hasPart {Y} movie:characterName {Z} WHERE Z LIKE “edward scissorhands” IGNORE CASE USING NAMESPACE movie = <http://example.org/movies/>

Optional path expressions RDF is semi-structured Even when the schema says some object should have a particular property, it may not always be present in the data: Users have names and email addresses, but Geert-Jan is a user without a known email address foaf:firstName Jeen um:User type type person001 foaf:mbox j.broekstra@tue.nl person002 Geert-Jan foaf:firstName

Optional path expressions (2) To be able to query for all users, their first names, and if known their email address, SeRQL introduces optional path expressions: SELECT DISTINCT Person, Name, Email FROM {Person} rdf:type {um:User}; foaf:firstName {Name}; [foaf:mbox {Email}] USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/>, um = <http://example.org/usermodel/>

CONSTRUCT queries • CONSTRUCT-queries return RDF statements • each RDF statement matching the query pattern is returned • The query result is • a subgraph of the original graph, or; • a transformed graph • This mechanism allows formulation of simple rules

CONSTRUCTing subgraphs Retrieve for each movie the title and year as RDF Statements CONSTRUCT * FROM {M} rdf:type {movie:Movie}; movie:year {Y}; movie:title {T} USING NAMESPACE movie = <http://example.org/movies/>

Movie Model (repeat) movie:Genre movie:Movie rdf:type rdf:type movie:Romance movie:Comedy rdf:type movie:genre movie:genre movie1 movie:year “1990” movie:Role movie:title movie:hasPart rdf:type “Edward ScissorHands” r1 “Edward ScissorHands” movie:characterName actor1 movie:playedBy

Graph Transformations Create a graph of actors and relate them to the movies they play in (through a new ‘playsInMovie’ relation) CONSTRUCT {A} foaf:firstName {FN}; foaf:familyName {LN}; my:playsInMovie {M} movie:title {T} FROM {M} movie:title {T}; movie:hasPart {} movie:playedBy {A} foaf:firstName {FN}; foaf:familyName {LN} USING NAMESPACE movie = <http://example.org/ontology/movie/>, foaf = <http://xmlns.com/foaf/0.1/>, my = <http://example.org/my/own/namespace/>

Nested Queries • often necessary for conditions on a set (rather than a single value) • ‘if a value x exists such that…’ • ‘if the property p is in the set …’ • SeRQL has three nested query forms: • IN operator • ANY and ALL operator quantification • EXISTS operator

Using nested queries (1) • Using EXISTS to retrieve all movies for which no rating is known SELECT DISTINCT movie, mtitle FROM {movie} rdf:type {movie:Movie}; movie:title {mtitle} WHERE NOT EXISTS (SELECT rating FROM {rating} rdf:type {um:MovieRating}; um:onMovie {movie}) USING NAMESPACE movie = <http://example.org/movies/>, um = <http://example.org/usermodel/>

Using nested queries (2) • Using the ALL modifier to retrieve the highest actor rating for each user SELECT DISTINCT user, rating, fname, lname FROM {user} rdf:type {um:User}; um:rating {} rdf:value {rating}; um:onActor {} foaf:firstName {fname}; foaf:familyName {lname} WHERE rating >= ALL (SELECT otherRating FROM {user} um:rating {} rdf:type {um:ActorRating}; rdf:value {otherRating}) USING NAMESPACE foaf = <http://xmlns.com/foaf/0.1/>, um = <http://example.org/usermodel/>

Using nested queries (3) • Using the IN operator to find all movies which share at least one genre with “Gone with the Wind” SELECT DISTINCT movie, mtitle FROM {movie} rdf:type {movie:Movie}; movie:title {mtitle}; movie:genre {genre} WHERE genre IN (SELECT otherGenre FROM {} rdf:type {movie:Movie}; movie:title {gwtw}; movie:genre {otherGenre} WHERE label(gwtw) LIKE "gone with the wind" IGNORE CASE) USING NAMESPACE movie = <http://example.org/movies/>, um = <http://example.org/usermodel/>

Using the Sesame API example applications in Sesame CVS

Using Sesame as a library • Include Sesame jar files in your classpath • sesame.jar, openrdf-util.jar, openrdf-model.jar, rio.jar • sparql-sesame.jar, sparql-core.jar optional • Use the Sesame Repository API to create, access, query, etc. RDF models in Sesame repositories.

Creating a Repository object import org.openrdf.sesame.repository.*; import org.openrdf.sesame.sailimpl.memory.*; … // first repository: an in-memory store Repository rep = new Repository( new MemoryStore()); rep.initialize(); // second repository: an in-memory store // with RDFS inferencing enabled Repository rep2 = new Repository( new MemoryStoreRDFSInferencer( new MemoryStore())); rep2.initialize();

Querying a Sesame Repository String query = “SELECT X, Y FROM {X} P {Y}”; // execute the query and give me the result. QueryResult result = rep.evaluateTupleQuery(QueryLanguage.SERQL, query); // a query result is a set of solutions. for (Solution solution: result) { // each solution is a set of variable bindings. Value x = solution.getValue(“X”); Value y = solution.getValue(“Y”); // do something interesting with the values here … } result.close();

Transactions • Sesame repositories have full transaction support • By default, the repository runs in autoCommit mode • every add or remove operation is treated as a single transaction • Explicit Transaction objects can be used to group operations into transactions

Using Transactions Transaction txn = rep.startTransaction(); try { // add the first file txn.add(inputFile1, baseURI1, RDFFormat.RDFXML); // add the second file txn.add(inputFile2, baseURI2, RDFFormat.RDFXML); txn.commit(); } finally { if (txn.isActive()) { // something went wrong during the transaction, // so we want to cancel it completely, and return to // the state before the transaction started txn.rollback(); } }

Context support Mechanism to identify groups of statements Each statement gets an (optional) extra context identifier (a URI) Instead of triples we now have quads We can make additional statements about the group by using the context identifier

How context works source foaf-jeen.rdf foaf-chris.rdf source Sesame Repository context1 context2

Some Use Cases for Context Provenance tracking allows easy updates when a source document has a new version allows querying of particular sources within one repository Versioning and Time Tracking the context identifier can be used to indicate different versions of the same information we can also use it to identify which information is valid at which period in time

Default vs. Named Context A named context is a context with an associated context identifier. Each repository can have any number of named contexts The default context is that part of the store that is queried when no named context is specified in the query Each repository has exactly one default context Depending on configuration, the default context can contain only the statements which have no associated named context (exclusive mode) all statements, including those in all named contexts (inclusive mode)

Inclusive vs. Exclusive Default context (exclusive mode) Sesame Repository Default Context (inclusive mode) source context1 foaf-chris.rdf context2 foaf-jeen.rdf source

SW Application Development with the Sesame Framework