280 likes | 294 Views
Learn about the LarKC platform for large-scale reasoning, including components, features, deployment, and benefits for users. Explore the infrastructure for storage, retrieval, communication, and distributed reasoning in this comprehensive guide.
E N D
Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani
Outline • Session 1: Introduction to Linked Data • Foundations and Architectures • Crawling and Indexing • Querying • Session 2: Integrating Web Data with Reasoning • Introduction to RDFS/OWL on the Web • Introduction and Motivation for Reasoning • Session 3: Distributed Reasoning: Because Size Matters • Problems and Challenges • MapReduce and WebPIE • Session 4: Putting Things Together (Demo) • The LarKC Platform • Implementing a LarKCWorkflow http://larkc.eu
Goals of LarKC LarKC = a platform for large scalereasoning Quote from EU Project Officer: “LarKC's value is as an experimental platform. LarKC is as an environment where people can go to replicate (or extend) their results in an environment where all the infrastructural heavy lifting has already been taken care of” 4
Goals of LarKC LarKC = a platform for large scalereasoning Quote from EU Reviewer: “Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs of something that was only possible before at high cost” 5
What do we mean by: reusable components reconfigurable workflows provide infrastructure needed by all users: storage & retrieval registration of plugins communication (plugin2datalayer, plugin2plugins) synchronisation (anytime behaviour) remote execution (abstracts from local/remote storage) remote data-access (abstracts from local/remote invocation) (will) provide instrumentation & measuring caching integration of very heterogeneous components heterogeneous data: unstructured text, (semi)structured data heterogeneous code: Java, scripts, remote services("wrap & integrate") LarKC = a platform for large scale reasoning 6
What do we mean by: LarKC = a platform for large scale reasoning not only from raw large numbers • from performant data-layer • from parallel deployment of plugins • from load-balancing strategies • … but also from interaction of multiple components • e.g. avoid reasoning through selection: SELECT + REASON 7
Simplified Framework But what about Flexibility, Modularity Scalability, Distribution?
The LarKC Platform Architecture LarKC Platform Plug-in Registry LarKC RTE Management Interface Plug-in Managers Data Layer Storage Resources Computing Resources RDF Store RDF Doc User Desktop Machine High-Performance Computer Cloud Resource
The LarKC Platform - Components LarKC RTE Initialisation and invocation of workflows Plug-in Registry Management of plug-ins Mgmt Interface Workflow deployment Plug-in Manager Plug-in execution Data Layer Data management
The LarKC Platform - Features Plug-in Registry LarKC RTE Mgmt Interface Plug-in / workflow descriptions and plug-in parameter are in RDF Separation of workflow specification and execution Integration of various endpoints (e.g. SPARQL endpoint) and applications Workflow branching, splits, merges
Workflow Description _:i larkc:pluginTypeOf <urn:eu.larkc.plugin.identify.MyIdentifier> ; larkc:pluginConnectsTo _:t1 , _:t2 . _:d larkc:pluginTypeOf <urn:eu.larkc.plugin.decider.MyDecider> . _:t1 larkc:pluginTypeOf> <urn:eu.larkc.plugin.transform.MyTransformer> ; larkc:pluginConnectsTo _:d . _:t2 larkc:pluginTypeOf <urn:eu.larkc.plugin.transform.MyTransformer> ; larkc:pluginConnectsTo _:d . _:e larkc:endpointType <urn:eu.larkc.endpoint.sparql> ; larkc:endpointConnectsTo _:d . SPARQL Transformer Filter Decider Identifier Filter Transformer
The LarKC Platform - Features Plug-in Manager Data Layer (API) Plug-in (remote) execution, Parallelisation support, Anytime Behaviour Data caching, instrumentation and event processing Data storage, data streaming, parallel request handling
Distribution: JavaGAT • Toolkit providing adapters to access remote resources • Enables the usage of HPC cluster from / within LarKC workflows • Causes additional overhead depending on network / resource settings
Distribution: JEE Technology • Wrapping a plug-in into a Java servlet and deploying it to a servlet container (e.g., Tomcat) • Overhead relatively small in comparison to JavaGAT
Parallelisation Support Running multiple instances of the same plugin simultaneously Implementation of parallelism in the concurrent regions of the plugin’s code
Outline • Session 1: Introduction to Linked Data • Foundations and Architectures • Crawling and Indexing • Querying • Session 2: Integrating Web Data with Reasoning • Introduction to RDFS/OWL on the Web • Introduction and Motivation for Reasoning • Session 3: Distributed Reasoning: Because Size Matters • Problems and Challenges • MapReduce and WebPIE • Session 4: Putting Things Together (Demo) • The LarKC Platform • Implementing a LarKC Workflow
See also • Linked data gathering • Slides, pointers etc at http://sild.cs.vu.nl/