1 / 28

Scalable Integration and Processing of Linked Data

Learn about the LarKC platform for large-scale reasoning, including components, features, deployment, and benefits for users. Explore the infrastructure for storage, retrieval, communication, and distributed reasoning in this comprehensive guide.

jdunham
Download Presentation

Scalable Integration and Processing of Linked Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani

  2. Outline • Session 1: Introduction to Linked Data • Foundations and Architectures • Crawling and Indexing • Querying • Session 2: Integrating Web Data with Reasoning • Introduction to RDFS/OWL on the Web • Introduction and Motivation for Reasoning • Session 3: Distributed Reasoning: Because Size Matters • Problems and Challenges • MapReduce and WebPIE • Session 4: Putting Things Together (Demo) • The LarKC Platform • Implementing a LarKCWorkflow http://larkc.eu

  3. Session Outline

  4. Goals of LarKC LarKC = a platform for large scalereasoning Quote from EU Project Officer: “LarKC's value is as an experimental platform. LarKC is as an environment where people can go to replicate (or extend) their results in an environment where all the infrastructural heavy lifting has already been taken care of” 4

  5. Goals of LarKC LarKC = a platform for large scalereasoning Quote from EU Reviewer: “Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs of something that was only possible before at high cost” 5

  6. What do we mean by: reusable components reconfigurable workflows provide infrastructure needed by all users: storage & retrieval registration of plugins communication (plugin2datalayer, plugin2plugins) synchronisation (anytime behaviour) remote execution (abstracts from local/remote storage) remote data-access (abstracts from local/remote invocation) (will) provide instrumentation & measuring caching integration of very heterogeneous components heterogeneous data: unstructured text, (semi)structured data heterogeneous code: Java, scripts, remote services("wrap & integrate") LarKC = a platform for large scale reasoning 6

  7. What do we mean by: LarKC = a platform for large scale reasoning not only from raw large numbers • from performant data-layer • from parallel deployment of plugins • from load-balancing strategies • … but also from interaction of multiple components • e.g. avoid reasoning through selection: SELECT + REASON 7

  8. Overall approach of LarKC

  9. How to deploy LarKC

  10. Why would people (like you) want to use LarKC

  11. Simplified Framework But what about Flexibility, Modularity Scalability, Distribution?

  12. The LarKC Domain

  13. The LarKC Platform Architecture LarKC Platform Plug-in Registry LarKC RTE Management Interface Plug-in Managers Data Layer Storage Resources Computing Resources RDF Store RDF Doc User Desktop Machine High-Performance Computer Cloud Resource

  14. The LarKC Platform - Components LarKC RTE Initialisation and invocation of workflows Plug-in Registry Management of plug-ins Mgmt Interface Workflow deployment Plug-in Manager Plug-in execution Data Layer Data management

  15. The LarKC Platform - Features Plug-in Registry LarKC RTE Mgmt Interface Plug-in / workflow descriptions and plug-in parameter are in RDF Separation of workflow specification and execution Integration of various endpoints (e.g. SPARQL endpoint) and applications Workflow branching, splits, merges

  16. Workflow Description _:i larkc:pluginTypeOf <urn:eu.larkc.plugin.identify.MyIdentifier> ; larkc:pluginConnectsTo _:t1 , _:t2 . _:d larkc:pluginTypeOf <urn:eu.larkc.plugin.decider.MyDecider> . _:t1 larkc:pluginTypeOf> <urn:eu.larkc.plugin.transform.MyTransformer> ; larkc:pluginConnectsTo _:d . _:t2 larkc:pluginTypeOf <urn:eu.larkc.plugin.transform.MyTransformer> ; larkc:pluginConnectsTo _:d . _:e larkc:endpointType <urn:eu.larkc.endpoint.sparql> ; larkc:endpointConnectsTo _:d . SPARQL Transformer Filter Decider Identifier Filter Transformer

  17. The LarKC Platform - Features Plug-in Manager Data Layer (API) Plug-in (remote) execution, Parallelisation support, Anytime Behaviour Data caching, instrumentation and event processing Data storage, data streaming, parallel request handling

  18. Distributed Execution Support

  19. Distribution: JavaGAT • Toolkit providing adapters to access remote resources • Enables the usage of HPC cluster from / within LarKC workflows • Causes additional overhead depending on network / resource settings

  20. Distribution: JEE Technology • Wrapping a plug-in into a Java servlet and deploying it to a servlet container (e.g., Tomcat) • Overhead relatively small in comparison to JavaGAT

  21. Parallelisation Support Running multiple instances of the same plugin simultaneously Implementation of parallelism in the concurrent regions of the plugin’s code

  22. How it works…

  23. How it works…

  24. How it works…

  25. How it works…

  26. Outline • Session 1: Introduction to Linked Data • Foundations and Architectures • Crawling and Indexing • Querying • Session 2: Integrating Web Data with Reasoning • Introduction to RDFS/OWL on the Web • Introduction and Motivation for Reasoning • Session 3: Distributed Reasoning: Because Size Matters • Problems and Challenges • MapReduce and WebPIE • Session 4: Putting Things Together (Demo) • The LarKC Platform • Implementing a LarKC Workflow

  27. Conclusion

  28. See also • Linked data gathering • Slides, pointers etc at http://sild.cs.vu.nl/

More Related