Università Politecnica delle Marche

DBin: an all round Semantic Web platform for user communities Giovanni Tummarello, Ph. D SEMEDIA Semantic Web and Multimedia Universita' Politecnica delle Marche, Ancona, Italy http://semedia.deit.univpm.it TOWARD A SCALABLE MULTIMEDIA METADATA INFRASTRUCTURE USING DISTRIBUTED COMPUTING AND SEMANTIC WEB TECHNOLOGIES Patrizia Asirelli1, Maria Grazia Di Bono1, Massimo Martinelli1, Ovidio Salvetti1, Oreste Signore1 1Institute of Information Science and Technologies (ISTI), Italian National Research Council (CNR), via Moruzzi 1, 56124 Pisa, Italy Patrizia.Asirelli@isti.cnr.it, Maria.Grazia.DiBono@isti.cnr.it, Massimo.Martinelli@isti.cnr.it, Ovidio.Salvetti@isti.cnr.it, Oreste.Signore@isti.cnr.it Michele Catasta2, Christian Morbidoni2, Francesco Piazza2 , Giovanni Tummarello2 2SeMedia, Universita' Politecnica delle Marche, Ancona, Italy http://semedia.deit.univpm.it mcatasta@acm.org, c.morbidoni@deit.univpm.it, f.piazza@univpm.it, g.tummarello@gmail.com Università Politecnica delle Marche SEMEDIA Semantic Web and Multimedia Università Politecnica delle Marche SEMEDIA Semantic Web and Multimedia

“Accessing” the Semantic Web The direct approach: • “The Semantic Web consists of many RDF graphs nameable by URIs.” Carroll, Bizer, Hayes, Stickler ISWC 2004, www2005, etc. • Perfectly supported by SparQL • need Ink cartdrige? So easy to ask HP.com • need a data cable? So easy to ask Nokia.com

The Semantic Web What if I was interested in “datacables that work for my Nokia 1234” (no matter who produces them)? Or “all about beer Peroni” (Reviews, comments, places where I can buy it with prices, pictures of its glass, its brewery) ? Inverse approach: • The Semantic Web consists of many concepts (URIs) which are annotated at global scale

Scalability issues Direct: Accessing Nokia.com/data.rdf • I know exactly who to ask to • Network traffic= the size of the document, • Computational complexity: neglectable Inverse: “Something about” my Nokia1234 • Many parties will have something to say Find them/distribute the query/collection traffic Impose them the query answering burden How to join local data?

A P2P / Personal Semantic Space approach File sharing, P2P “philosophy”: • Downloads a lot, too much not a problem • Shares what downloaded • Uncommitted, no guarantees, join and leave at will But for the Semantic Web: • Exchanges, downloads and serves “RDF” rather than “files” • Searches about user interests rather than file names (“sicilian cucine”, “Scottish pubs”) • Remembers (almost) all  grows a local triplestore

SW P2P a la Napster, scenario and possibilities (2) Storing a lot of metadata locally..why? 1) Why not, disk space is very cheap! (and its just metadata) 2) Key enabler to the global scalability!  “use the Semantic Web” without direct network traffic or external computational burden 3) Maximally fast and interactive (high speed local queries) 4) Gets your local CPUs at work!  much more powerful than what a server can give you for free, allows sophisticate information processing (reasoning, filters)  Its your computer, “your” data  personalized algorithms for rating and trust  relate it to your local resources (SW desktop integration)

One size (P2P model) doesn’t fit all Several SW P2P approaches have been proposed: • Centralized + Crawlers/feeds • Distributed queries (Edutella et Al.) • Distributed RDF storage (RDFPeers) Different scenarios, not the one studied here, see RDFGrowth paper.

“RDFGrowth” - Design essentials In this scenario we don’texpect others to: • Execute external arbitrary graph queries • Perform active “information hunt” for us. • No replicating queries, no query forwarding or routing. In general, no operations that induce non constant burden • Provide a service if not in a purely “best effort” fashion • No uptime guarantees, no service guarantees

RDFGrowth Groups Based on a shared definition of “interesting URIs” via a local semantic query: Example Beer&Breweries Group: Select x where {x} <rdf:type> {<beer:Beer>} Select x where {x} <rdf:type> {<beer:Brewery>} Those who join will execute the query and share information about the resulting URIs

Information “Surrounding” a URI: RDF “Neighbours” MSG(statement) (approx def). The “blank node closure” of the statement. RDFN (def). The RDFN of a resource is the graph composed by all the MSGs involving the resource itself. Similar to a Concise Bounded Resource Description (CBRD) given in [URIQA], but is differs mainly by the use of the “involves” RDFN(Uri) is the only remote query allowed in RDFGrowth

Locating “News”:RDFN Hash Set RHS(URI)=Hashes(canonicalize(RDFN(URI))) • Concise values exposed to the network to reppresent the knowledge a peer has about a URI • Peers looking for information about a URI use the published RHS to select who to talk to (i.e. the most “interesting” peer)

Simulations, no KEL delay

With KEL publishing delay

Using “epidemic news propagation”

DBin: everything else around RDFGrowth

 A lot of pragmatic decisions A complete Semantic Web application today means… Deliverable integration platform Domain application/GUI Trust policies tools Data flow pipeline RDF signing methodologies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

URL Data Handling and URI Minting URL Data handling (Up/Down)URI Minting

The P2P infrastructure will deal with RDF but: • People want to access pictures, mp3s, files, not just see URI. URL resolving/downloading • Automated uploading also needed! URL Data Handling and URI Minting

RDF Storage URL Data handling (Up/Down)URI Minting RDF Storage

RDF/S Storage • Many choices! • We chose Sesame (SeRQL was schema aware long ago) • Thanks Sesame guys! New features being added.. (See trust filtering, pipelining)

RDF P2P Transport Layer RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

Ontology Import Policies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

Ontology importing need care • They have an active role (see Sesame forward inference) • Policies to control import and export are needed • Our approach: DBin will “suggest” that the import of ontologies when discovered, but the process is never fully automated and can be reversed

RDF Signing Methodologies RDF signing methodologies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

Authorship@model level: RDFTrustToolkit Being certain about Who said what We want: • Small granularities! As information will flow bit by bit. • Signatures INSIDE rdf, so they flow along with the data and are kept withing the triplestore. Tools: RDF canonical serialization (J. Carroll ) MSG theory  reify a singly triple/sign it all!

The MSG theory comes handy From the RDF blank nodes semantics:  A MSG is also the minimum unit that can be sent across a P2P so that once merged the original graph will be restored. From the MSG definition: If s and t are distinct statements and t belong to MSG(s), then MSG(t) = MSG(s). Each statement belong to one and only one MSG. A graph can be univocally decomposed in MSGs. the signature can be attached to a single, arbitrary triple in a MSG!

So RDFTrustToolkit.. • Given a URI will list the MSG around it • Given a MSG will list and verify existing signatures • Can remove existing signatures or add new ones

Signing a Minimum Selfcontained Graph (MSG) mbz:artistid=15290 IdKtR...j4c= dbin:Base64sigvalue mus:is_part_of http://public../69..bd.pem rdf:subject dbin:X509Certificate mus:plays rdf:type rdf:object rdf:predicate rdf:type mus:Song rdf:statement rdf:type mus:file mus:Band MD5:123123 Larger MSG lowers %overhead. In the DBin, signign overhead approx 25%.

Example (RDFTrustToolkit run) Original MSG <rdf:RDF xmlns:dbin="http://dbin.org#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <rdf:Description rdf:about="http://dbin.org/Home/Panaioli"> <dbin:student>Panaioli Fabio</dbin:student> </rdf:Description> </rdf:RDF> Signed MSG <rdf:RDF xmlns:dbin="http://dbin.org#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <rdf:Description rdf:about="http://dbin.org/Home/Panaioli"> <dbin:student>Panaioli Fabio</dbin:student> </rdf:Description> <rdf:Description rdf:nodeID="A0"> <rdf:predicate rdf:resource="http://dbin.org#student"/> <dbin:PGPCertificate>http://public.dbin.org/cont/238785872.asc</dbin:PGPCertificate> <dbin:Base64SigValue>MCwCFOPX….A7xIaUgBzhkjcB5w==</dbin:Base64SigValue> <rdf:subject rdf:resource="http://dbin.org/Home/Panaioli"/> <rdf:object>Panaioli Fabio</rdf:object> <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/> </rdf:Description> </rdf:RDF> Canonical Reppresentation [http://dbin.org/Home/Panaioli http://dbin.org#student Panaioli Fabio] Signature , Base 64 encoding MCwCFOPX….A7xIaUgBzhkjcB5w== A triple is reified and both the signature and a URI to a public key certificate are attached

Trust Policy Tools Trust policies tools RDF signing methodologies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

From authorship to trust Given the DS infrastructure, Given it’s a local, personal DB repository  many solutions! Examples: • I trust Giovanni and Christian (only) • I trust who Giovanni and Christian trust. • Etc etc..

Data Flow Pipeline Trust policies tools Data flow pipeline RDF signing methodologies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

Metadata Pipeline This P2P scenario requires a pipeline of RDF processing. At low Pipeline levesraw growth, monotonicity, no inference At higher level inference, trusted growth, information revision, filtering etc.. Non monotonic filters (revocation) Raw RDF Repository RDFTrust filtering RDFS inference enabled Repository Even smarter Repository RDFGrowth P2P OWL, Domain rules, etc.. User selected policies Approved Schema Repository

Domain Application GUI Domain application/GUI Trust policies tools Data flow pipeline RDF signing methodologies Ontology Import Policies RDF P2P transport layer URL Data handling (Up/Down)URI Minting RDF Storage

User interface All but a detail! • As we’re trying to “deliver” a sw tool for regular people, if it is unusable  failure • More complex than simple “semantic web browsing”: • Editing must be taken into considerations • Filtering, revocations, ontologies, P2P must be armonized at user level by an appropriate facade

DBin “domain applications”: brainlets • A single, downloadable domain specific application to run on top of DBin • Brainlets creation does NOT require programming knowledge, just XML editing. •  Communities can be started by domain experts rather than SW hackers!

DBin “domain applications”: brainlets • A single, downloadable package containing: • The setup information for the RDFGrowth the transport layer • The ontologies to be used for annotations in the domain (e.g. The beer ontology). • A general GUI layout;. which components to visualize (e.g. A message board, an ontology browser, a “detail” view) and how they're cascaded in terms of selection/reaction • Templates for domain specific “annotations”, e.g. a “movie review template” • Templates for readily available, “pre cooked” domain queries, which are structurally complex domain queries with only a few simple free parameters, • A suggested trust model and information filtering rules for the domain. e.g. Public keys of well known “founding members” or authorities, preset “browsing levels”. • Support material, customized icons, help files etc.. • A basic RDF knowledge package

DBin eats this.. (+ annotation ontologies) <Brainlet name="Beer"author="Onofrio Panzarino" version="1.0"> <Ontology file="brainlet/beer.owl"/> <GUED name="Beer"> <Topic name="Beers" uri="http://www.purl.org/net/ontology/beer#Beer"> <Child query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent" recursive="true"> <Child subjectBy="rdf:type" icon="/icons/beer.gif"/> </Child> <Child subjectBy="rdf:type" icon="/icons/beer.gif"/> </Topic> <Topic name="Ingredients" uri="http://www.purl.org/net/ontology/beer#Ingredient"> <Child query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent" recursive="true"> <Child subjectBy="rdf:type"/> </Child> <Child subjectBy="rdf:type"/> </Topic> </GUED> <View id="Focus" /> <View id="GUEDNavigator" title="BeerNavigator" icon="icons/nav.gif" selecterFor="main" /> <View id="Comments" title="Comments" listenTo="main" selecterFor="comments" /> <View id="Comment" title="Details" listenTo="comments" /> <View id="Gallery" listenTo="main" /> </Brainlet>

The user sees this

DBin Based on the Eclipse RCP so: Looks nice Multiplatform Completely plug-in based Lots of possible plugins Open source Demo time!

Conclusions • Its RDF for the masses! • DBin is an early tool to explore this scenario, we don’t claim its fit for the real task yet, notably: • Performance issues • Real world hardening • Usability testing in real communities • There are many alternative to each of the blocks • A lot of cool ideas are within a plugin reach (Semantic desktop integration, maps, WS integration etc) Hurray! 

SEMEDIA Semantic Web and Multimedia Thanks for your attention Get DBin at http://www.dbin.org

Università Politecnica delle Marche

Università Politecnica delle Marche

Presentation Transcript

Formazione e struttura delle stelle compatte