A content model/API which unifies Wikis, RDF, binaries, CDS, iMapping, WIF

To be used in NEPOMUK, WAVES,WIFand beyond Max Völkel (FZI)with a lot of help from Mikhail Kotelnikov (Cognium Systems), Tim Romberg (FZI),and Heiko Haller (FZI) 8.3.2007 A content model/API which unifies Wikis, RDF, binaries, CDS, iMapping, WIF

History • 12.03.07: Version 2 • Feedback from Tim Romberg • Refined Content API • 10.03.07: Version 1

Inputs to this document • Current Conceptual Data Structures (CDS) API • http://cds.xam.de • NEPOMUK Common Semantic Wiki API (CSWA) • http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id=1449 page 45 • NEPOMUK WMO (Wiki Metadata Ontology) • http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id=1449 page 42 • WIF API • see http://xam.de/2006/2006-12-19-SA-Andreas-Kurz.pdf (German) • WAVES content API - unfinished • Many discussions! Thanks!

Part I: Wikis

What is a wiki? • Collection of pages with one root page • Each page has a name • Content of a page has many versions • Each version has an author, change date • Content contains • text, • structure (headlines, lists, tables), • links to external resources (images, URLs), • links to other wiki pages • RecentChanges

What is a semantic wiki? • Collection of pages with one root page • Each page has a name and URI • Content of a page has many versions • Each version has an author, change date • Content contains • text, • structure (headlines, lists, tables), • (semantic?) links to external resources (images, URLs), • (semantic?) links to other wiki pages, • semantic statements about the current page, • semantic statements about anything • RecentChanges

Integration of legacy wikis • Import / Export a single page: • Import / Export a whole wiki (including versions?): WIF+RDF WAF.zip JWiki No longer a core part ofthe content API. But a pragmatic tool to ease migration of wiki content. HTTP XHTML Wiki Syntax Legacy Wiki  documented in http://xam.de/2006/2006-12-19-SA-Andreas-Kurz.pdf

Repository functions: Page[] getAllPages, Page getRootPage, Page[] getChangedPagesSince(long datetime) Page[] search( String fulltextQuery ) Addressing: URL getURL( String pagename ) Page functions Content: Reader getWIF(), putWIF(Reader r) Reader getWikiSyntax(),putWikiSyntax(Reader r) Reader getHTML() Metadata: String getAuthor, long getLastChanged, URI getPreviousVersion Semantic: Reader getRDF(), putRDF( Reader r ) Collaboration beginEdit(), cancelEdit() JWiki – An API to access (semantic) wikis(fomerly known as the WIF API) Note:This API is not the final content API. This is just a simple web-wiki-wrapping API. The content API can be inspired by this.

WIF WIF in a nutshellsee also http://wiki.ontoworld.org/wiki/WIF • WIF is a subset of XHTML …(makes it easier to generate WikiSyntax from it) • structure: <html>, <head>, <title>, <body> • headlines: <h1>, <h2>, <h3>, <h4>, <h5>, <h6> • block elements: <p>, <pre>, <hr> • lists: <dl>, <dd>, <dt>, <ol>, <ul>, <li> • tables: <table>, <tr>, <th>, <td> • inline elements: <i>, <b>, <sub>, <sup>, <tt> • images: <img> with attribute ’src’, • links: <a> with attribute ’href’ • Other HTML elements may be ignored by WIF-processors. • … which can be augmented with XML IDs to annotate XML elements

WIF WIF example <?xml version="1.0" encoding=“UTF8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml“ xmlns:ex="http://example.com/ontology/2007#”> <head> <title>Example WIF</title></head> <body> <h1>Heading h1</h1> <p>internal link: <a class=“internal” href="Sandbox">Sandbox</a> internal link: <a class=“internal” href="Sandbox">See here</a>> external link: <a class=“internal” href="http://wikipedia.com">Wikipedia</a> <img src="/images/b/b6/Uka.png" alt="logo"/> The Elephant is gray and lives in <a span ID=“ex;livesIn”>Africa</a>. </p> </body> </html> Not required Not required

Part II: Beyond Wikis

Varying content granularity From a single word (such as a concept or wikiname) to a document (such as a wiki page content) Should be able to represent semantic concept-maps (iMapping) Annotation of everything Even of parts of the content Semantic statements about everything Concepts, resources, binaries, statements, … Versioning of everything (binaries, content, model) Resources can be versioned individually, RDF is best versioned on the model Fulltextsearch Autocomplete support For links and plain text Client-Server To enable real-time collaboration, just like in a wiki Synchronisation Mirror a content store constantly Commit local repository to a shared repository – and vice versa Import/Export From other repositories, from RDF, from web sites, from applications Compatible with semantic web and web architecture Implementable  Generalised Requirements

Identified orthogonal requirements • Representation on two layers • Content Resources: map URIs to content (strings or binaries), might be local or remote •  need for binary store • Semantic Metadata (statements about URIs), locally managed  RDF store • Queries for named items (such as Wikinames) needed for auto-completion support • Versioning of everything (binaries, content, model) • Verisoning on content – on a per-resource basis • Versioning of semantic model – on a model-basis • Fulltext search + Autocompletion • Needs an index of all content resources • Queries to lexiccon are needed for fulltext-auto-completion • Web API • RESTful API • Integration • Clear export format • Support for synhcornisation between two repositories needed • Implementable

Building Blocks - Who calls whom?To simplify: We can use RDF both as Triple Model and as String Store… Content Server Triple Model Metadata: (URI, URI, URI) Versioning per model String Store URI  String (can contain XHTML) Versioning per resources Binary Store URI  stream + mimetype Versioning per resource Keyword Index URI  fulltext index

Client Building Blocks RESTful HTTP Content Server Access Rights? Authentication? SPARQL qeuries,GET/PUT RDF files URI[] search( query )(URI+String)[] complete(…) GET/POST resources • RDF Model (triples & strings) • Metadata layers • Content: (:x, c:hasContent, „…String or WIF…“) • Binding: (docURL#ID, c:hasMappedSubject :elephant) • Statements: (:elephant :livesIn :africa) • App.Metadata: (:x c:hasAuthor „Heiko Haller“) • Plugin data (:x, imap:postionX „187“^^xsd:int) • Versioning per model URI+type Add/remove (URI+text) Binary Store URI  stream + mimetype Versioning per resource String[] index() Keyword Index URI  fulltext index Add/remove( URI + text )

A closer look on the metadata layers • Content – gives all content snippets a URI • :x, c:hasContent, „…String or WIF…“ • Binding – annotates XHTML elements • (docURL#ID, c:hasMappedSubject :elephant) • Statements – pure semantic RDF data • (:elephant :livesIn :africa) • Application Metadata – needed for versioning etc.. • (:x c:hasAuthor „Heiko Haller“) • Plugin data – allows plug-ins to store their metadata about the content, too. May use any RDF. • (:x, imap:postionX „187“^^xsd:int) • Metadata about binaries – just an index

Classic RDF for any data: <x> foaf:firstName „Max“foaf:lastName „Völkel“foaf:phone „123“foaf:phone „456“ Problem: One cannot annotate the literals Idea: Separation of concerns, separate linking from content storing Normalised RDF for content representation Linking – in RDF <x> foaf:firstName <1> foaf:lastName <2> foaf:phone <3> foaf:phone<4> . Content – could be stored elsewhere <1> rdf:value „Max“<2> rdf:value „Völkel“<3> rdf:value „123“<4> rdf:value „456“ Idea: give each content literal a URI- makes versioning and annotation easier

Linking of content snippets is at application layer • For wikis::x a wiki:Name:x wiki:hasPageContent :y:y a wiki:Content • Constraints: • A wiki:Name can have zero or one wiki:Content • Links only between wiki:Name • Versioning on WikiPageContent nodes • For iMapping/CDS::x cds:hasDetail :z .:x cds:hasBefore :u. • Constraints: • All relations have an inverse • …

Storing a wiki page in RDF :x a c:NamedItem .:x c:hasContent „Elephant“ .:x c:hasDetail :y .:y c:hasContent „<html><head>…</head><body>…… This grey animal lives in <span id=„123“>Africa</span>……</body></html>“ . Statements – pure semantic RDF data :x :livesIn :africa .:africa :hasPart :SouthAfrica . XHTML and RDF String-content could also be in a separate store, but it‘s convenient to implement first like this. Named Items can be treated as concepts – even if their name is versioned

:x :livesIn :africa . :africa :hasPart :SouthAfrica . :x c:hasContent Elephant XHTML and RDF :y c:hasContent <html> <head>…</head> <body>…… This grey animal lives in <span id=„123“>Africa</span>…… </body></html> These triples can be copied to RDF, to enable queries for them Implicitly defines: :y#123 c:hasContent . :y c:contains :y#123 . Africa URI = base URI + ‚#‘ + XML ID Statements ?Mapping Strings

B: Binding (or Mapping) - links XHTML elements to RDF URIs • Use cases: • Retain which semantic statements are derrived from which DOM nodes – so we can update the model when the DOM node is deleted • Update rules / embedded queries – so we can present up-to-date data in the document •  As DOM nodes can have an ID, you can annotate everything you want • Example 1: • Record that (:x :livesIn :Africa) was derrived from element #123 in the content of :y : • :y#123 b:hasMappedSubject :x .:y#123 b:hasMappedPredicate :livesIn .:y#123 b:hasMappedObject :africa . • Example 2: • Update the value of #123 from the triple (:x, :livesIn, *) • :y#123 b:hasQuerySubject :x:y#123 b:hasQueryPredicate :livesIn

Data Model Layer CakeUpper layers build on lower layers • Ontologies • Basic data model (like StoRDF ): - virtually all data is in RDF • Content: URI  Content + mimetype • String • With XHTML markup • XHTML markup with ID attributes defined • Just a plain String • Binary • Model: (URI,URI,URI) • Index: • RDF indexes semantic statements hidden in content strings • Keyword index indexes all strings + all string hidden in binaries • Persistence: RDF for triples and strings + a binary store Implemention Ideas

Ontology Layer Cake • iMapping Ontology „imp:“ • Imp:hasPosX, imp:hasPosY, imp:isExpanded, … • CDS Ontology „cds:“ • Cds:hasBefore, cds:hasAfter, cds:hasDetail, cds:hasContext, cds:hasAnnotation, cds:hasType, cds:hasTag, cds:hasLinktarget, cds:hasLinkSource • Constraint: Relations must have an inverse • Wiki Ontology „wiki:“ • Wiki:hasPageContent – assings a named snippet to an unnamed snippet. Together, they are a wiki page. • Content Management Ontology „c:“ • Name snippets vs. Unnamed snippets • c:NamedItem, c:Item • Snippets have an author, versions, changedate • C:hasAuthor, c:hasPrevioisVersion, c:hasChangeDate, … • Binary resource are annotated with • C:size, c:mimetype • Editing status • C:currentlyEditedBy, c:exists, …

Part III: Getting technical

Multiuser API can be used locally or remote • Start edit <x> • Fetch content (WIF or binary) of <x> • Fetch RDF of <x> • User editing… • Fetch fresh metadata • Update current content • Commit new content + new RDF Client User interface Content API Content web client HTTP WIF Web RDF binaries HTTP Content web server Server Content API Content Server Impl

Requirements Everything must be addressable Can be annotated  expressivity Easier versioning Easier data exchange Addressing for machines: Unique IDs needed Must be 100% unique Used for semantic queries Addressing for humans: Memorable names needed Some ambiguity in naming has to be tolerated Used for linking items Needs auto-completion support Solution: We distinguish two kinds of items, both having a URI: NamedItem The content of a NamedItem has naming characteristics, i.e. when linking to somewhere, one links to a name. A NamedItem also represents all kinds of concepts. The content of a named item should not contain markup and may not be binary content. UnnamedItem represents plain content. Semantic links state the relations between NamedItems and UnnamedItems. Content and Naming

What is a semantic content repository? • A repository contains a collection of items … • One root item • Each item has content and a URI • Content can be text, text with mark-up, or any other mime type • … and semantic links between items • Versioning on two layers • Each items content is versioned (for all kinds of resources) • The semantic model is versioned as a whole • On both layers: each version has an author & change date • Locking on a per-item basis • Recent Changes • Queries on semantic model and queries on item content

Core concepts • Repository • Model – of semantic statements • URI • Content • Item • NamedItem • UnnamedItem • Adressable Triple • Mime-type • User • Version • Change date

1:1 Data Model 1:* Repository 1 current version 1 root item URI ModelVersion URI Item 0..1 current version. No current version: Item does not exist Change date User URI Addressable Triple URI ItemVersion Change date User URI URI URI Content Any binary or character stream Mime-type

1:1 Simplified Data Model – no versioning 1:* Repository 1 root item URI Item URI Addressable Triple Change date User 0..1 content URI URI URI Content Any binary or character stream Mime-type

Repository functions: Content Item createItem(URI) Item createNamedItem(URI) Item getItem(URI), Item[] getAllItems(), Item getRootItem(), Item[] getChangedItemsSince(long datetime) Item[] query( String query ) All kinds of queries … Addressing URI getURI( String contentOfANamedItem ) Item functions Content InputStream getContent(),putContent(InputStream in) Metadata URI, Mimetype getMimetype User, changeDate Collaboration beginEdit(), cancelEdit() The Content API

Methods returning RDF/XML. For each item we always return: NamedItem or UnnamedItem,mimetype, author, changedAt, for short items: content GET /items/ - all items GET /items/root - the root GET /items?changedSince=… GET /items?query=… All kinds of queries .. GET /items?name=… Returns all items with that name Item functions GET /items?uri=… Return content PUT /items=uri=… Store body of put request Allow POST with same semantics GET /items/meta?uri=… Return RDF/XML with item data POST /items?uri=…&edit=start POST /items?uri=…&edit=cancel Webifying the API

Next Steps • Define the different APIs as Java interfaces: • A content API facade • Components: • Binary store • Keyword index • RDF store • A WIF util class

APPENDICES

Requirements from different sides

WAVES (sorry it‘s german) SWE-spezifische Funktionen SW-Projekt-Cockpit Meeting-Protokolle Anforderungs-verwaltung Polarion Weitere allgemeine Funktionen (Freie) Sichten bilden Wissen kontextabh. automatisch anbieten Desktop-Integration E-Mails / Office / Browser Synchronisation (Offline, Sharing) Persönlicher Wissensspeicher RestrukturierenÜberarbeiten Grundfunktionen Rechte-verwaltung Orgamodell Rollen, Gruppen, .. Suche Artikulieren,Vernetzen/Referenzieren setzt voraus essentiell wichtig zukünftig

Existing approaches for linking XHTML and RDF • RDFa: http://rdfa.info • eRDF: http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml • Comparison: http://www.bnode.org/blog/2007/02/12/comparison-of-microformats-erdf-and-rdfa

A WIF logo WIF

Finding a good name • Semantic Content Repository • Score • Secore – 40.000 hits • Semantic Web Content Repository • Swecor – 28 hits • Swcore – 2000 hits • Swecr – 10 hits • SWCR – 20.000 hits

Todos • Align with • JCR • WebDAV • SVN

OLD slides kept here as a backup

WIF file Title (for offline browsing) Content Text Structure Semantic annotations RDF file (in fact a clever query result) Wiki page metadata Author, changeDate, Title,previousVersion, mimeType Semantic annotations Extracted from WIF, enables queries A global RDF model Remote queries Contain all page-RDFs + global statements What you can GET and PUT WIF WMO RDF XHTML

First Try: Conceptualmodel Serialisation/Standard Java API User interface Communication Collaboration-API HTTP Wiki application Semantic Wiki Semantic Wiki Semantic Wiki Exchange-API Content 1 RDF Model+ n WIF CDS API iMapping Knowledge model SwingClient WAF = n WIF pages WAF API Complete sem. wiki content Semantic Wiki Syntax Semantic Wiki Syntax WOM + RDF WIF using WMO WEM Semantic Wiki Syntax Semantic wiki page DOM XHTML RDFa Semantically annotated doc. RDFa API XML Structured document = existing

Two Content Layers • Content blocks (aka wiki pages or CDS items) • essential subset of XHTML • RDFa annotations for semantic annotations of parts of the content • RDF model • Content metadata – such as (Elephant, authoredBy, User:Max) • Only in RDF model • Semantic statements - such as (Elephant, livesIn, Africa) • Redundant with RDFa in content. RDF data is authorative. „2007-03-08“^^xsd:_date changeDate User:Mikhail hasAuthor Africa Elephant User:Max „Gray“ hasColor hasContent hasContent hasContent livesIn 001011010110101010101010101110101110 binary <a href=„Africa“ rel=„ex:livesIn“>Africa</a>

Content Server API • InputStream GetContent( uri ) • Returns binary stream or WIF • RDF query( query ) • Query can be: a URI, a triple pattern or a SPARQL • Result can be: RDF/XML, Turtle, NTriples, JSON, …

A content model/API which unifies Wikis, RDF, binaries, CDS, iMapping, WIF