1 / 18

Knowledge Streams: Stream Processing of Semantic Web Content

Knowledge Streams: Stream Processing of Semantic Web Content. Mike Dean Principal Engineer Raytheon BBN Technologies mdean@bbn.com. Assumptions. Technology – Intermediate Familiarity with RDF and OWL Interest in Stream processing Scalability. Presenter Background .

charo
Download Presentation

Knowledge Streams: Stream Processing of Semantic Web Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies mdean@bbn.com

  2. Assumptions • Technology – Intermediate • Familiarity with RDF and OWL • Interest in • Stream processing • Scalability

  3. Presenter Background • Principal Engineer at Raytheon BBN Technologies (1984-present) • Principal Investigator for DARPA Agent Markup Language (DAML) Integration and Transition (2000-2005) • Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL • Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present) • Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups • Co-editor of the W3C OWL Reference • Local co-chair for ISWC2009 • Other SemTech presentations • Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher) • Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher) • Use of SWRL for Ontology Translation (2008) • Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John Hebeler) • How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge Corpus (2009) • Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim and Todd Schneider)

  4. Outline • Motivation • Vision • Building Blocks • Demonstration

  5. Motivations • Timeliness • Performance

  6. Timeliness • Streaming minimizes latency • Processing elements see events as they occur • Resources are expended only when an event occurs • This is in contrast to polling • Latency averages half the polling interval • Resources are expended on every poll • Popular web syndication mechanisms such as RSS and Atom involve polling

  7. Performance • Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access • Analogous to XML SAX vs. DOM • For suitable applications, this can be 10x faster than loading all statements into memory or a KB

  8. 2 Streaming Stories • dumpont of OpenCyc (circa 2003) • HTML-based ontology visualization tool periodically bogged down daml.org server • Reimplementation using event-based Jena ARP parser yielded 10x performance and scalability improvements • Billion Triples Challenge 2009 • Streaming analysis of the 2009 corpus was performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk • Compare to loading 10-20K statements/second on a server

  9. Stream Processing Examples • Unix pipes • Dataflow architectures • Streambase • IBM System S/InfoSphere Streams

  10. Semantic Web Sensor Network Gazetteer Imagery Database Archive Sensor IM Vision: Knowledge Streams Users Community of Interest 1 Data Sources • Processing elements • Consume and produce subgraphs • Multiple functions may be combined aggregation context filter augmentation inference User 1 Community of Interest 2 • Persistent pipelines • Streams of statements comprising object subgraphs • URI naming allows drill-down • Provenance, timestamps User 2 distribution correlation persistent queries translation alerts CEP NLP RSS User 3 Distribution And Processing Elements

  11. Goals • Web-scale • Decentralized among multiple sites • Heterogenous implementations • Long-lived, persistent connections • User accountability • Introspection over the processing network for control and optimization • E.g. aggregating subscriptions • Balance with security, privacy, and autonomy concerns

  12. Building Blocks • RDF Content • Existing stream processing frameworks • Workflow systems • Publish/subscribe message oriented middleware

  13. RDF Payloads • Malleable data • Standards-based graph structure • Can easily add, remove, and transform statements • Self-describing • Unique naming via URIs • References to vocabularies and ontologies • Potential for inference

  14. Workflow Systems • Graphical environments for developing processing pipelines • Yahoo Pipes, DERI Pipes, SPARQLMotion • Nice user interfaces for development and execution http://pipes.deri.org

  15. Semantic Complex Event Processing • Complex Event Processing • One of the leading edges of rules technology • Formal specification of higher-level events in terms of lower-level events • E.g. alert if the moving average increases 15% within a 10 minute window • Engine can be compiled/optimized for a specific rule set • High-volume deployments in finance and other industries • Most implementations focus on self-contained tuples • Semantic Complex Event Processing • Enrich CEP using Semantic Web technology • Emerging topic at recent conferences • Early implementations • Wrappers around open source CEP engines • Native implementation • Provides a powerful set of operators and engines for Knowledge Streams

  16. Implementation Approach • Well-defined APIs for implementing operators • Operator execution containers • Could encapsulate existing engines • Start with manual processing network configuration, then automate

  17. Use Cases • Dissemination of metadata for new satellite imagery • Social network changes • Alerting of friends’ new publications • …

  18. Demo • Processing using DERI Pipes with new operators • Ingest of #SemTechBiz tweets using Twitter Streaming API • Conversion of JSON to RDF • Mapping to SIOC vocabulary using SWRL rules • Enrich by matching Twitter @handles with contacts • Persistent buffering using Java Message Service • Monitoring

More Related