Semantic Web at BBN Parliament & ASIO SCOUT

Semantic Web at BBNParliament & ASIO SCOUT Dave Kolas March 23, 2011

Semantic Technology at BBN • Research: Contributing to standards and technologies • Contributing authors to OWL and SWRL • Currently developing new semantic-based reasoning language-SILK • Active in the Geospatial Semantic Web community, including GeoSPARQL • Applications: Addressing real-world, operational challenges • Intelligence Data integration and disambiguation • Geospatial Image Applications • Analytics with Semantic Web underpinnings

Key BBN-Led Semantic Initiatives 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 DAML Integration & Transition (DARPA) SASSI/MMON Horus (DARPA/IMO) ICEWS (DARPA) Combine/APSTARS CODE/COBRA SID ISSL DIESL (DARPA) FCG (AFRL/AMC) IEII (DARPA) NOTAMS (AFRL/AMC) Integrated Learning (DARPA) GARCON-F (NGA) PINT Multi-INT Fusion (LM) Geospatial SW (NGA) JFP ACTD (JFCOM) Medical, Commercial Applications W3C OWLRecommendation SemWebCentral.org BBN Hosts ISWC 2009 asio.bbn.com

Parliament • Parliament • In continuous customer use for ~8 years (Originally DAML-DB) • Triple Store with SPARQL support • Implemented as a persistence layer for Jena/Sesame • Includes spatial and temporal indexing/processing • Open source! http://parliament.semwebcentral.org/

Design Part of Jena Joseki Spatial Index Processor Parliament Framework Model External Storage Spatial Index (PostGIS) IndexingGraph Temporal Index Processor Temporal Index (BDB) Parliament Parliament Graph

Parliament’s Index Structure • Applications often require efficient statement insertion • Goal: Balanced insertion, query performance, and space required • Parliament stores triples using two components: • Resource dictionary • Statement table

Parliament Statement Table Each entry (statement) contains: • Three resource ID fields: Subject, predicate, and object of the statement • Three statement ID fields: Next statements using the same resource as subject, predicate, and object • Bit-field flags encoding statement attributes

Parliament Resource Dictionary Each entry (resource) contains: • Bidirectional string-to-ID mapping • Three statement ID fields: First statements using this resource as subject, predicate, and object • Three count fields: Numbers of statements using this resource as subject, predicate, and object • Bit-field flags encoding resource attributes

Parliament Index Example Resource Table Statement List

List Length Means (Std Deviations)

Parliament • Experiments demonstrate that Parliament maintains excellent query performance while significantly increasing insertion throughput and decreasing space requirements • Future work will include: • Query optimization strategies • Analysis of Parliament’s internal rule engine • Further optimizations to the storage structure

Part 2 – Asio Scout • Motivation – Linking data across multiple data sources • Underlying data is in different formats (RDBMS, Web Services, RDF) and different vocabularies • Consolidating data does not solve the problem • Different users need to use this data for different purposes, from different perspectives • Use Semantic Web technology to link the data sources together in a flexible, evolvable way

Web Service RDBMS Asio Scout Snoggle 1 Query: SPARQL 6 Query Result Set Query Decomposition 2 Semantic Query Decomposition (SQD) Backwards Rule Chaining 5 Generation ofSub Queries 3 Automapper Semantic Bridge Database Semantic Bridge Web Service Semantic Bridge SPARQL Endpoint 4 Data Access Parliament

1 SPARQL Query Query Result Set 6 Query Decomposition 2 Backwards Rule Chaining Generation ofSub Queries 3 5 Data Access 4 Federated Query Semantic Query Decomposition (SQD) Semantic Bridge Rel. Database Semantic Bridge Web Service Semantic Bridge Rel. Database Semantic Bridge SPARQL Endpoint RDBMS One Web Service RDBMS Two SPARQL Endpoint

Rule Expansion • When the query is received, the system expands the query with the mapping rules provided • Triples in the query are tagged with the rules that can produce them, and then are expanded into the body of the rule with variable unification • This process is iterative until the query cannot be expanded any further

Ontology Reasoning • Subclass/Subpropertyreasoning • This creates more possibilities for inferring query statements (in the way that you would expect) • Disjoint Classes • Liberal use of disjointness statements in the ontology help to reduce generated UNIONS in certain domain ontology situations • Pairwise disjointness can be asserted automatically for some data source ontologies • Functional / Inverse Functional Properties • Many unbound variables introduced in the rule expansion stage are unified

Independent Domain Ontology • Because domain ontology is defined unlinked to data sources, it can remain unbound to the design decisions incorporated in them • As data sources are added to or subtracted from the system, the domain ontology can remain constant • This is a key difference between Scout and other approaches

Practical Concerns • New entities often have to be minted for the domain ontology • An additional SWRL builtin provides skolems • This results in extra processing in the query expansion stage • In an RDBMS, negation is often meaningful • SPARQL can support querying for negation using the BOUND filter operator and OPTIONAL query blocks • Restricting this concept to leaf data source atoms allows query rewriting to remain valid • This has been a requirement for deploying this software in real situations

Performance • Preprocessing of a query takes milliseconds • “Streaming” results means that you start getting query answers back quickly, even if there are many results • The graphs within the SPARQL algebra are split by data source individually • This involves batching database queries • The processing has very little overhead over just executing the queries

SHARD SHARD is released open-source. • BSD license. • Look at: • My webpage (Search for “SHARD krohloff”) • Sourceforge (SHARD-3store) • Use svn to get code: svn co https://shard-3store.svn.sourceforge.net/svnroot/shard-3store shard-3store • Don’t worry - this command is on SourceForge! Happy to talk offline cloud computing and SHARD • Use of SHARD, open-source projects, etc…

SHARD Design Overview • Cloud-based triple-store on HDFS. • Method calls at client. • Processing in cloud via MapReduce jobs. • Move results to local machine. • Massively scalable. • Commodity hardware. • SPARQL queries. • Optimize for complex queries with large response sets. • Basic inferencing.

A Map-Reduce Implementation • Open implementation of Google’s tech. • Developed from Google publications. • VERY large-scale! http://hadoop.apache.org/ • Cloudera has great training material. • Look for VMWare training virtual machine. http://www.cloudera.com/ • Baked-in robustness makes it practical…

Semantic Web at BBN Parliament & ASIO SCOUT