250 likes | 364 Views
A proposal for Distributed Query Execution Engine in a Grid Environment to CoDIMS. Gustavo Gaburro Trevisol Alvaro C. P. Barbosa. Agenda. Introduction; Data Integration Middleware Systems; CoDIMS; Concepts; Proposal; Conclusions. Introduction.
E N D
A proposal for DistributedQuery Execution Engine in a Grid Environment to CoDIMS. Gustavo Gaburro Trevisol Alvaro C. P. Barbosa
Agenda • Introduction; • Data Integration Middleware Systems; • CoDIMS; • Concepts; • Proposal; • Conclusions.
Introduction • Increase of heterogeneous and distributed data; • Demand: necessity of integrated data view; • Solution: Data Integration Middleware Systems.
Data Integration Middleware • Provide from distributed and heterogeneous data sources a single, uniform and homogeneous vision • Developing data integration middleware systems is not a simple task, due to the complexity of: • Supporting multiple data models; • Data semantic integration ; • Query processing strategies; • Transaction control techniques.
Application Layer Scan Join Integration Layer (Middleware) Other Project Data Layer Data Integration Middleware Systems
CoDIMS(Configurable Data Integration Middleware System) • Environment for generation of configurable data integration systems for a specific application; • Characteristics: • Based on framework, components and web services; • Flexible and configurable; • Uses only the necessary and tailored components. • “What you need is only what you get” (wynwyg).
Reading only data sources Updating data sources Incorporating a new component for a specific application
Concepts: Wrappers • Translation of the Native's Data Source Model to the Global/Canonic Data Model; • Communication with the data sources.
Responsible for the processing of PEC (Query Execution Plain); Concepts: MEC (Query Execution Engine) Data Result PEC • A MEC is composed for operators and data types.
In a relational MEC, its operators come from relational algebra; The operators are: Select; Project; Cartesian product; Natural Join. Concepts: MEC
An operation of relational algebra can be implemented using different algorithms; Example: Natural Join Merge Join; Loop Nested Join. Concepts: MEC
It defines the steps in which the operators are executed to process the query; A PEC is composed of operators that MEC can execute. Concepts: PEC (Query Execution Plan)
The PEC structure is a Tree: Concepts: PEC
Data integration Systems • They can be benefited of a Grid environment to increase performance: • Sending wrappers and operators to Nodes of Grid, to execute in a parallel way; • Distributing the execution of Sub-queries; • Distributing the execution of operatoins over sub-results.
Wrapper-Grid Layer[Biancardi 2005] • Incorporating a Wrapper-Grid layer to CoDIMS: • Wrappers Distribution in Grid Nodes, allowing a parallel execution of the sub-queries; • Decrease the execution time of queries.
Restrictions: • All wrappers are allocated in all Nodes: maintenance difficulty; • Distributing de execution only for sub-queries; • Sub-result from witch sub-query send back to MEC: sequential integration of sub-results;
Example: Query Execution Application
Implementation • Grid Environment: Globus Toolkit 3; • Development of a sending object module (wrappers e Operators) to remote execute in Grid Nodes. • Problens with 4 do Globus Toolkit release 4; • Configuring the test environment; • Laboratory of Research in Networks and Multimedia (LPRM); • JAVA; • Apache Soap.
Expected contributions • Distribute the integration of sub-results, decreasing the overload in the Query Processing Component (MEC); • Distribute/Allocate Operators in a Grid Environment; • Send dynamically wrappers to Grid Nodes; • Decrease the execution time of queries.
Future Works • Implements a optimizer for distributed queries; • Implements a dynamic scheduler to send wrappers and operators to Grid Nodes to execute; • Incorporate a module execute Queries using pipeline and semi-join.
e-mail: gtrevisol@inf.ufes.br Web Site: http://codims.lprm.inf.ufes.br/ Questions