1 / 38

OLAP Query Processing in Grids

DMG 2007. OLAP Query Processing in Grids. Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso Federal University of Rio de Janeiro, Brazil.

jena-benton
Download Presentation

OLAP Query Processing in Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DMG 2007 OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso Federal University of Rio de Janeiro, Brazil

  2. Agenda • OLAP in Grids • Database clusters • GParGRES • Preliminary experimental results • Conclusion

  3. OLAP using Grids Grid • Problem • How to fulfill OLAP needs within current grid software infrastructure ? • Grid Services ? • Adapting database cluster techniques to grids ? Figure thanks to Peter Kacsuk and Gergely Sipos

  4. Using Database Clusters in Grids PC Cluster DBMS DBMS DBMS DBMS DBMS Middleware • A sequential “black-box” DBMS runs at each node • It is based on database replication • The middleware coordinates parallel query execution • Applications and databases are easily migrated from sequential environments • Both inter and intra-query parallelism can be explored Clients

  5. Inter-query Parallelism DBMS DBMS DBMS DBMS • Improves overall system throughput • Good for OLTP applications • Not adequate for OLAP Node 1 Node 2 Q1 Q2 Q3 Node 3 Q4 Node 4

  6. Intra-query Parallelism DBMS DBMS DBMS DBMS • Reduces individual query execution time • Required for high-performance OLAP Node 1 Q11 Q1 Node 2 Q12 Virtual Partitioning Q13 Q14 Node 3 Q2 Q3 Node 4 Q4

  7. ParGRES • Database cluster middleware developed by our research group • Optimized for OLAP support • Provides inter and intra-query parallelism • Offers high-performance for heavy-weight query processing over large databases • using non-expensive components • in a non-intrusive way • Making no changes to database applications • Keeping the same DBMS • Keeping the same logical database schema • Shows super-linear speedup

  8. GParGRES

  9. GParGRES: a Database Grid Middleware • Middleware that provides • Transparent access to distributed databases in a grid • Intra-query parallelism during heavy-weight query processing • Based on ParGRES • Assumes that grid nodes are PC clusters running ParGRES instances • Intra-query parallelism is achieved through virtual partitioning • Two levels of query splitting • Grid-level splitting: implemented by GParGRES • Node-level splitting: implemented by ParGRES

  10. GParGRES: Architecture

  11. GParGRES: Architecture Concentrates metadata concerning GParGRES services, such as the state of each FS and DQS instance, and ParGRES execution in the nodes

  12. GParGRES: Architecture GParGRES entry point, responsible for creating new instances of DQS

  13. GParGRES: Architecture Manages global query execution. Receives the query and splits it into subqueries by using virtual partitioning to implement intra-query parallelism. It also performs final result composition

  14. GParGRES: Architecture Grid Local Query Service (GLQS) – local component responsible for receiving subqueries from DQS and passing them to the local ParGRES instance

  15. GParGRES: Architecture

  16. GParGRES: a Database Grid Middleware

  17. GParGRES: a Database Grid Middleware

  18. GParGRES: a Database Grid Middleware

  19. GParGRES: a Database Grid Middleware

  20. GParGRES: a Database Grid Middleware select o_orderpriority, count(*) from orders where o_orderdate >= date '1993-07-01' group by o_orderpriority;

  21. GParGRES: a Database Grid Middleware create table temp_result_1 ( o_orderpriority varchar(2), order_count integer);

  22. GParGRES: a Database Grid Middleware select o_orderpriority, count(*) from orders where o_orderdate >= date '1993-07-01' and o_orderkey >= ? and o_orderkey < ? group by o_orderpriority;

  23. GParGRES: a Database Grid Middleware

  24. GParGRES: a Database Grid Middleware

  25. GParGRES: a Database Grid Middleware

  26. GParGRES: a Database Grid Middleware insert into temp_result_1 values (?,?);

  27. GParGRES: a Database Grid Middleware select o_orderpriority, sum(order_count) from temp_result_1 group by o_orderpriority;

  28. GParGRES: a Database Grid Middleware

  29. GParGRES: Preliminary Experimental Results • A preliminary GParGRES prototype has been implemented in Java • Simple versions of DQS and GLQS (using ParGRES components) were implemented • Experimental Setup • Two clusters from Grid’5000 • Parasol cluster: 64 nodes, each with 2 Opteron 2.2GHz CPUs, 2GB RAM and 73 GB HD • Paraquad cluster: 64 nodes, each with 2 Dual Core Xeon 2.33GHz CPUs, 4GB RAM and 160GB HD • Kadeploy • Generate customized images of operating systems and applications • PostgreSQL 8.2.4 • ParGRES • TPC-H database and queries • SF = 1

  30. GParGRES: Preliminary Experimental Results (cont.) • Two kinds of experiments • Isolated clusters • Mixed Configuration

  31. GParGRES: Preliminary Experimental Results (cont.) • Isolated cluster - Parasol

  32. GParGRES: Preliminary Experimental Results (cont.) • Isolated cluster - Paraquad

  33. GParGRES: Preliminary Experimental Results (cont.) • Mixed Configuration

  34. GParGRES – Implementation Issues • Goals • To implement all components as grid services • WSRF-compliant components: RS, FS and GLQS • When running in a grid managed by Globus Toolkit 4, RS can be implemented by Web Service Monitoring and Discovery Service (WS MDS) • Techniques employed in OGSA-DAI will help implementing some components (e.g. FS)

  35. OGSA-DAI Open Grid Services Architecture - Data Access and Integration OGSA-DQP Open Grid Services Architecture - Distributed Query Processing New data models for grid warehouses Wehrle et al. propose a data model for distributing and querying a data warehouse in computing grids The warehouse is formed by data “chunks” Special structures are needed (e.g. X-Tree) Related Work

  36. Conclusion • GParGRES is a grid service for OLAP query processing • It provides transparent inter and intra-query processing with • No need for application migration • No need for database schema migration • DBMS independence • GParGRES explore successful techniques implemented in ParGRES • Two levels of query splitting • Grid-level splitting: implemented by GParGRES • Node-level splitting: implemented by ParGRES • Components are WSRF-compliant, easing the compatibility with existing grid solutions • Preliminary results obtained in Grid’5000 show good performance

  37. Future Work • Integration with OGSA-DAI • Support for partial database replication • Support for top-k queries • Extension of best position algorithms

  38. DMG 2007 Thanks! A different view of the Grid Kandinsky the Grid, 1923 Albertina Museum Vienna

More Related