1 / 46

Knowledge Systems Course

From data source to Guided Exploration: a tool stack for Semantic Web navigation May 2006 Aduna, Jeen Broekstra jeen.broekstra@aduna-software.com. Knowledge Systems Course. Time table. [8:45 – 9:00] Introduction Aduna RDF and the Semantic Web [9:00 – 9:30] Software stack: Middleware

tori
Download Presentation

Knowledge Systems Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From data source to Guided Exploration: a tool stack for Semantic Web navigation May 2006 Aduna, Jeen Broekstra jeen.broekstra@aduna-software.com Knowledge Systems Course Knowledge Systems Course

  2. Knowledge Systems Course Time table • [8:45 – 9:00] Introduction • Aduna • RDF and the Semantic Web • [9:00 – 9:30] Software stack: Middleware • Sesame: storage and querying for RDF • Aperture: retrieving metadata from data • [9:45 – 10:15] Software stack: Presentation • Spectacle: RDF-based Facet Navigation • AutoFocus: Cluster Visualization • [10:15 – 10:30] Demo + discussion

  3. Knowledge Systems Course About Aduna • Where are we: • Amersfoort, the Netherlands • What do we do: • Develop software for effective navigation and visualization of large information sources • Use Semantic Web technology to enable better search

  4. Knowledge Systems Course Aduna and Software • Software Components: • Aperture • a framework for extracting metadata from various kinds of sources (e.g. Word files, E-mail, PDF, images,…) • Sesame • a toolkit/database for scalable storage and querying of RDF, RDFS and OWL • Spectacle • efficient facet navigation • Cluster Map • visualization component

  5. Knowledge Systems Course RDF in one slide • Data model for expressing knowledge • basic building block: statement <person001> <name> “Jeen” . • groups of statements form graphs name Jeen person001 email j.broekstra@tue.nl worksIn projectMemberEmail name project001 Sesame

  6. Knowledge Systems Course RDF Schema in one more slide • RDF Schema is a Vocabulary Description Language • it allows specification of domain vocabulary and a way to structure it • Class, Property, subClassOf, subPropertyOf, domain, range • Formal semantics add simple reasoning capabilities: • class and property subsumption • domain and range inference rdfs:Class rdf:type rdf:Property Person rdf:type rdfs:domain rdfs:subClassOf name Researcher rdf:type person001

  7. Knowledge Systems Course presentation middleware The tool stack Sesame metadata storage and reasoning Aperture metadata extraction

  8. Knowledge Systems Course Aperture

  9. Knowledge Systems Course What is Aperture? • Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems. • Open Source project by Aduna and DFKI:http://aperture.sourceforge.net/

  10. Knowledge Systems Course Aperture Features • Crawl information systems such as file systems, websites, mail boxes and mail servers • Extract full-text and metadata from many common file formats • View files in their native applications • Ease of use: easy to learn, easy to code, easy to deploy in industrial projects • Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms • Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)

  11. Knowledge Systems Course Supported File Formats • Plain text • HTML, XHTML • XML • PDF (Portable Document Format) • RTF (Rich Text Format) • Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher • Microsoft Works • OpenOffice 1.x: Writer, Calc, Impress, Draw • StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw • OpenDocument (OpenOffice 2.x, StarOffice 8.x) • Corel WordPerfect, Quattro, Presentations • Emails (.eml files)

  12. Knowledge Systems Course The Sesame Framework

  13. Knowledge Systems Course What is Sesame? • A framework for storage, querying and inferencing of RDF and RDF Schema • A Java Library for handling RDF • A Database Server for (remote) accessto repositories of RDF data • Open Source project by Adunahttp://www.openRDF.org/

  14. Knowledge Systems Course Sesame features • Light-weight yet powerful Java API • Highly expressive query and transformation languages • SeRQL, SPARQL • High scalability (O(107) RDF triples on desktop hardware) • Various backends • Native Store • RDBMS (MySQL, Oracle 10, DB2, PostgreSQL) • main memory • Reasoning support • RDF Schema reasoner • OWL DLP (OWLIM) • domain reasoning (custom rule engine) • Rio Toolkit: parsers and writers for different RDF syntaxes: • RDF/XML, Turtle, N3, N-Triples, TriX

  15. Knowledge Systems Course Sesame 2 architecture application HTTP / SPARQL protocol application HTTP Server Repository Access API SeRQL SPARQL SAIL API Rio SAIL Query Model RDF Model

  16. Knowledge Systems Course Sesame 2 architecture application Remote apps can communicate overthe Web with a Sesame server and update data or do queries HTTP / SPARQL protocol application HTTP Server Allows deployment of Sesame as a web-enabled database server (e.g. in Tomcat). Implements a superset of SPARQL protocol (HTTP REST) Local apps can just include (parts of) Sesame as a Java library and use it to process RDF data efficiently. Repository Access API Main Access API of SesameOffers developer-friendly methods for manipulating RDF data (query, adding, removing, updating) SeRQL SPARQL Declarative Querying and other ‘higher-level’ functions on SAILs SAIL API Rio SAIL Query Model Storage And Inference Layer System API for ‘wrapping’ storage backend RDF I/O Set of parsers and writers for RDF/XML, Turtle, N3, N-Triples.Can be used separately. RDF Model The core RDF model, containing objects and interfaces for URIs, blank nodes, literals, statements.

  17. Knowledge Systems Course The SAIL API • Storage And Inferencing Layer • Abstraction from physical storage • allows other Sesame components to function on any type of store • can be used as a wrapper layer for aparticular data source • System Internal API • application developers typically do not use it directly

  18. Knowledge Systems Course The Repository Access API • A single Java object representation for a Sesame database, offering methods for • evaluating a query and retrieving the result • adding RDF data from local file, from the web, as a text string, etc. • adding/removing (sets of) RDF statements • starting/stopping transactions

  19. Knowledge Systems Course Querying RDF • RDF is a labeled, directed graph of semistructured data • no rigid schema • An RDF query language needs to be able to address this: • graph path expressions • dealing with semistructured nature of RDF • flexible querying of both data and schema

  20. Knowledge Systems Course SeRQL • Language proposal based on best practices • Redesign of RQL to make it easier to use, incorporating ideas from many other query languages • Developed in the Sesame project • Expressive language, but still fairly easy to use • Support for RDF Schema • Implementation: Sesame

  21. Knowledge Systems Course Netherlands hasCapital Amsterdam areacode 020 SeRQL path expressions • {X} geo:hasCapital {geo:Amsterdam} • {X} geo:hasCapital {Y} • {X} P {Y}

  22. Knowledge Systems Course Netherlands hasCapital Amsterdam areacode 020 Chaining, branching and comparing • Chaining: • {X} geo:hasCapital {Y} geo:areacode {Z} • Branching: • {X} rdf:type {Y}; geo:areacode {Z} • Comparison operators: • String comparison: • X like “*Netherlands” • Y like “A*” • boolean comparison: • X < Y, X <= Y, Z < 20, Z = Y, etc.

  23. Knowledge Systems Course SeRQL query composition • Using the building blocks, we can compose complex queries. • SeRQL uses a select-from-where syntax SELECT X, Y FROM {X} geo:hasCapital {Y} geo:areacode {Z} WHERE Z like “020” USING NAMESPACE geo = <http://www.geography.org/schema.rdf#>

  24. Knowledge Systems Course Optional path expressions • RDF is semi-structured • Even when the schema says some object should have a particular property, it may not always be present in the data: • Persons have names and email addresses, but Lora is a person without a known email address name Jeen Person type type email person001 j.broekstra@tue.nl person002 Lora name

  25. Knowledge Systems Course Optional path expressions (2) • To be able to query for all persons, their first names, and if known their email address, SeRQL introduces optional path expressions: • SELECT • Person, Name, Email • FROM • {Person} my:name {Name}; • [my:email {Email}]

  26. Knowledge Systems Course CONSTRUCT queries • CONSTRUCT-queries return RDF statements • each RDF statement matching the query pattern is returned • The query result is • a subgraph of the original graph, or; • a transformed graph • This mechanism also allows formulation of simple rules

  27. Knowledge Systems Course SeRQL construct-queries Subgraph query: CONSTRUCT * FROM {X} geo:hasCapital {Y} hasCapital Netherlands Amsterdam Transformation query: CONSTRUCT {Y} my:inCountry {X} FROM {X} geo:hasCapital {Y} inCountry Amsterdam Netherlands

  28. Knowledge Systems Course SeRQL vs. SPARQL • Both: expressive query and transformation language • SELECT and CONSTRUCT • optional path expressions • support for context/named graphs • SeRQL (“circle”) • nested queries, language tags, … • user-friendly syntax (but YMMV) • very efficient Sesame implementation • SPARQL (“sparkle”) • W3C Standard (in progress) • tool interoperability: Jena, Redland, 3Store, Sesame, …

  29. Knowledge Systems Course SeRQL vs. SPARQL example SELECT X, Y FROM {X} geo:hasCapital {Y} geo:areacode {Z} WHERE Z like “020” USING NAMESPACE geo = <http://www.geography.org/schema.rdf#> PREFIX geo: <http://www.geography.org/schema.rdf#> . SELECT ?x ?y WHERE { ?x geo:hasCapital ?y . ?y geo:areacode ?z . FILTER (?z = “020”). }

  30. Knowledge Systems Course Presentation How to navigate ontology-based information

  31. Knowledge Systems Course An ontology is not enough • End users do not necessarily think in the same terms in which an ontology is modeled • Search and Navigation tools need to provide for allowing user-oriented access to the information • views • multiple access paths • recognizable options • quick results

  32. Knowledge Systems Course Navigation problems 1 • Too many links or categories • overwhelming offer • Deep hierarchies • information remains hidden

  33. Knowledge Systems Course Examples

  34. Knowledge Systems Course Navigation problems 2 • Query overspecification • zero results! • Query underspecification • millions of hits!

  35. Knowledge Systems Course Examples

  36. Knowledge Systems Course Faceted navigation 1 • Facet = meta-data element • e.g. 'author', 'title', 'date‘, ‘type’ • Facets have values • e.g. 'author is J. Brown' • In collections facet values are related • e.g. author 'J. Brown' is connected to title 'Once upon a time ...' • Faceted navigation = chose a facet value an see all related facets and values

  37. Knowledge Systems Course Faceted navigation 2 • Problem solved • user has problems specifying query • over- and underspecification • Solution • showing all options • give ways to drill down the information • Applied • database selection (e.g. job sites), e-commerce (e.g. travel), enhancement of (full text) search

  38. Knowledge Systems Course Example of faceted navigation Facet: Type Facet values: Adobe AD, HTML Document, XML Document Nr. of instances per facet values

  39. Knowledge Systems Course Facets are Data Views • Each navigation facet is driven by a SeRQL query on the underlying Sesame repository • SeRQL queries can retrieve and transform the data to provide a facet ‘view’ • Spectacle uses the query results to populate the facet with values

  40. Knowledge Systems Course Information visualization 1 • Types • Model visualization • Instance visualization • Examples • Hyperbolic tree, InXight • Graph visualisation, AquaBrowser • Claim of visualization: show things that you can't (easily) express in words or lists

  41. Knowledge Systems Course Information visualization 2 • Cluster Map = instance visualization • visualization of the search results • instances can be things like files, jobs, and people • Map shows AND, OR and NOT of query arguments

  42. Knowledge Systems Course Cluster Map examples

  43. Knowledge Systems Course Aduna AutoFocus AutoFocus helps you to explore data sources like files, websites and e-mail with Guided Exploration. AutoFocus scans data sources and automatically makes suggestions after you entered a search term. So if you are not completely sure what to look for, AutoFocus will help you with suggestions for refinement. Next to that you don’t have to store or search for information in complex directory hierarchies any more. AutoFocus will retrieve it anyway. Combined full text search in documents, websites and e-mail Relations shown in a Cluster Map Automatically generated suggestions help to refine the question Support for multiple data sources: documents, websites, e-mail boxes

  44. Knowledge Systems Course Aduna Spectacle Aduna Spectacle helps website visitors to find what they want with Guided Exploration. Aduna Spectacle supports faceted navigation. Users drill down step by step, making choices on multiple meta-data facets. Spectacle overcomes problems related to over- and underspecification. The user gets the right answer. Visitors find what they want without negative feedback like ´zero results´ Navigation on multiple facets of information collections Use of information increases with faceted navigation Easy to implement on top of your existing information sources

  45. Knowledge Systems Course Pointers • Adunahttp://aduna-software.com/ • AutoFocushttp://aduna-software.com/products/autofocus/ • Spectaclehttp://aduna-software.com/products/spectacle • Sesamehttp://www.openrdf.org/ • Aperturehttp://aperture.sourceforge.net/

  46. Knowledge Systems Course Demo & Discussion Time

More Related