290 likes | 357 Views
PESC. Programa de Engenharia de Sistemas e Computação. FEDERAL UNIVERSITY OF RIO DE JANEIRO. Spatial Query Broker in a Grid Environment. Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez. Outline. Motivation and Goal The Problem Related works The Proposal
E N D
PESC Programa de Engenharia de Sistemas e Computação FEDERAL UNIVERSITY OF RIO DE JANEIRO Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
Motivation • The dissemination of GIS systems, associated with the improvement of channels’ bandwidth, is increasing quickly and the interactions between data producers and consumers are becoming more frequent, complex and dynamic. • Some hot points in these relationships: • Huge amount of data spread by many different geographic places • Complexity of spatial data • Demand for sophisticated services delivered by web • The high price that shared resources may have in some federations (CPU time, storage space, ...) • Integration problems (many levels of heterogeneity) Distributed spatial operations and methods to improve their efficiency take an important role in this context . There are a lot of works involving spatial operations in a centralized context, but fewin a distributed context. The Grid computig paradigm aggregate many characteristics that can improve the execution of distributed spatial operations.
Goal This work aim at improving the efficiency of distributed spatial join by means of an architecture that permits the allocation of non-specialized computers in execution of the operation, reducing the overall response time. Spatial join was focused because it is a very common operation in GIS systems and has a high processing cost. The architecture also offers condictions to make experiments with new algorithms (filter/refine, scheduler, ...)
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
The Problem • Themes related with: • Transport • Hydrography • Infra-structure,... How to proceed with a spatial join in a pool of data providers that share a huge amount of spatial data, in order to have the response time bellow a limit stated by some quality criteria? The data fragmentation may be spatial and/or thematic (ie a hybrid schema) and there are local spatial indexes on each dataset This scenario could be depicted by a pool of regionalgovernmental agencies responsible by cartographic data generation, offering query-services that run over their data by mean of the internet.
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
Related Work • Many important works in spatial query processing are related with the filter / refine strategy [5]. Some of them are mentioned bellow: • Multi-Step processing of spatial joins Brinkhoff et al [6] • Raster signatures in spatial joins (4CRS) Zimbrao et al [30] • Multi-Steps with remote indexes (MR2) Ramirez and Souza [26] • On the other hand, the execution of the query plan in a distributed context may emphasize the parallelism as a manner to reduce the overall response time. • MR2 Ramirez [26] • Grid Greedy Node, Porto et al [25] • OGSA-DQP, Smith et al [27] • The need of a scheduler module in some of these strategies should guarantee an adequate load balance among the selected local SDBMS
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
The Proposal Dataset 1 Dataset 2 MBR join Filtering step: SDBMSs Geometric Filtering Exact processing Exact processing step: generic computacional resources Results In this work, the grid’s ability in offering resources on-demand is used to reduce the overall response time during distributed spatial query join operations in databases. The parallelism in previous works involves only those nodes that are storing spatial data mentioned in the query. Our proposal is involve also generic computational resources in the most expensive step of the filter / refine strategy: the exact geometry processing. Multi-step filter / refine strategy [6]
The Proposal Receives the global query and checks the user rights Auxiliary services Specialized CEs Generic CEs Meta-schedulers The follow picture gives an overview of the context:
The Proposal A specialized meta-scheduler, named Spatial Query Broker (SQB), is being proposed to deal with all spatial query processing, in a similar way as conventional Resource Brokers in grid environments.
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
SQB Architecture Manages all data flow and the sequence of events To find data providers that store needed data and to acquire CEs status Manages the exact geometry step over CEs Analysis and simplification of the query Selects the SDBMs and manages the filtering steps Delivers information about resources and data partitioning Is the interface with Ces, submitting and monitoring tasks Resources shared by organizations The SQB is composed by the following modules:
SQB Architecture MBRs + 4CRS Region r Theme 1 Region r Theme 2 Execution Monitor Inconclusive pairs and some positive hits (ids + # vertices) T1 T2 SDBMS CE2 SDBMS CE1 SDBMS CE1 MBR + Geometric filtering Steps managed by the optimizer
SQB Architecture • The Execution Monitor builds two queues to store the inconclusive pairs in order to deliver them to the CEs. • One of them are shared among faster CEs, while the other among slower ones. • The total number of vertices is adopted as indicator to the complexity of the processing. • A throughput indicator is previously picked up from the CEs and registered in the Information server (MDS) It isn’t necessary to sort the pairs
SQB Architecture Prepare query Acquire information Filtering Step Refining Step Simplified sequence diagram
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
Preliminary Tests Despite a prototype is under construction, a few tests were done with synthetic spatial datasets consisting of polygons in order to give us some relative parameters to guide our work while dealing with spatial joins among polygons (overlap predicate). Spatial join operations were performed over servers that have both datasets R-Tree indexed. The original datasets were partitioned in four and nine regular parts and the response time (RT) on each situation was taken: RT = TMSG * #messages + TTX * # bytes + TCPU + TI/O • Objets that cross boundaries were replicated on involved datasets (they weren’t split). • The tests were executed in three situations: • The whole query at once in a single SDBMS • The query over the same region broken in four parts and executed by four identical machines • The query over the same region broken in nine parts and executed by nine identical machines
Preliminary Tests 1 1 3 3 2 2 NW NW NE NE W W 4 4 5 5 6 6 SW SW SE SE 7 7 8 8 9 9 Theme 1 Theme 2
Preliminary Tests I/O Comm CPU 10992 11009 This operation is CPU bound and the communication cost has a low impact in the final response time. RT = TMSG * #messages + TTX * # bytes + TCPU + TI/O + Tremove replicas * Communication’s cost based on a 256kbps bandwidth
Preliminary Tests 1 9 # servers 4 The processing cost and the communication cost tend to reach a same magnitude when the number of servers increase. The superlinear speedup means, in this case, that computational resources available in a single machine were insufficient to reach good response time
Test conditions • The preliminary tests were executed under the following conditions: • Spatial Database: Secondo • Grid Middleware: Globus GT4 • Datasets: Two datasets composed by 10060 triangles indexed • Hardware: Sempron 2800, 1GB RAM, 80GB HD • OS: Fedora Linux • The overall architecture is under construction and is based on web services (WSRF)
Outline • Motivation and Goal • The Problem • Related works • The Proposal • SQB Architecture • Preliminary Tests • Remarks
Remarks • This work presents an architecture based on grid infrastructure tailored to cover some needs of a distributed geographic information system. • The focus was on offering a strategy to execute spatial queries over spatial databases managed by several organizations that are gathered in a federation • The filter/refine approach was adopted and tried to use some pre-existent spatial index in datasets. • A global ID structure must be proposed in order to: • Easily reduce the multi-processing of objects crossing boundaries after filtering step (avoiding to move them unnecessarily to CEs) • Isolate the processing in SQB from local IDs, improving the scalability • As next steps • Specify new cost models to help the optimizer and the scheduler taken into account the dynamic of the environment • Research the scheduling process in order to improve the reliability of the architecture • Compare the responsetime of a join, executed over a benchmark dataset, with that one executed in similar distributed environments
References 1. Adzigogov, L., Soldatos, J., and Polymenakos, L. (2005). "EMPEROR: An OGSA Grid Meta-Scheduler based on Dynamic Resource." Journal of Grid Computing, 3, 19-37. 2. Afgan, E. (2004). "Role of the Resource Broker in the Grid." ACM, Huntsville, Alabama, USA. 3. Andretto, P. e. a. (2004). "Practical approaches to Grid workload and resource management in the EGEE project.". 4. Azevedo, L. G., Monteiro, R. S., Zimbrão, G., and Souza, J. M. (2004). "Approximate Spatial Query Processing Using Raster Signature.". 5. Brinkhoff, T., Kriegel, H. and Seeger B.(1993). “Efficient Processing of Spatial Joins Using R-Trees”, In: Proceedings of the 1993 ACM SIGMOD, Washington,DC. 6. Brinkhoff, T., Kriegel, H., and Schneider, R. (1994). "Multi-Step Processing of Spatial Joins." Washington,DC - USA, 237-246. 7. Buyya, R., and Venegupal, S. (2004). "The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An overview and Status Report.". 8. Câmara, G., and Queiroz, G. (2002). "GeoBR: Intercâmbio Sintático e Semântico de Dados Espaciais.". 9. Di, L., Chen, A., Yang, W., and Zhao, P. (2003). "The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data.". 10. EGEE .(2006) "GLite - Installation and Configuration Guide v 3.0 (rev 2)" , European Union. 11. Egenhofer, M. J., and Herring, J. R. (1994) "Categorizing Binary Topological Relations Between Regions, Lines and Point in Geographical Databases" , NCGIA. 12. "Globus Toolkit 4."(2005). www.gridbus.org/escience/051205GlobusTutorialeScience.ppt, July/2006. 13. Foster, I., and Kesselman, C. (1999). "Computational grids." The Grid: Blueprint for a New Computing Infrastructure, Morgan-Kaufman. 14. Foster, I., Kesselman, C., and Tuecke, S. (2001). "The Anatomy of the Grid Enabling Scalable Virtual Organizations." Lecture Notes in Computer Science, 2150. 15. Gistafson, J. L. (1990). "Fixed Time, Tiered Memory, and Superlinear Speedup.".
References 16. GridWay Team .(2006) "GridWay 5 Documentation: User Guide" Madrid, Spain, Universidad Complutense de Madrid. 17. Güting, R. H., Behr, T., Almeida, V., Ding, Z., Hoffmann, F., and Spiekermann, M. (2004) "Secondo: An Extensible DBMS Architecture and Prototype" Hagen, Germany, Fernuniversität Hagen. 18. Hanssen, G. (2005). "The Filter/Refine Strategy: A Study on the Land-Use Resource Dataset in Norway.". 19. Ilya, Z., Memon, A., Petropoulos, M., and Baru, C. (2003). "Online Querying of Heterogeneous Distributed Spatial Data on a Grid." Brno, Cz, 813-823. 20. Kang, M.-S., and Choy, Y.-C. (2002). "Deploying parallel spatial join algorithm for network environment." IEEE, 177-181. 21. Meyer, W. S., and Souza, J. M. (2006). "Overlapped Regions with Distributed Spatial Databases in a Grid Environment." Rio de Janeiro, Brazil. 22. Meyer, W. S., Souza, J. M., and Ramirez, M. R. (2005). "Secondo-grid:An Infrastructure to Study Spatial Databases in Computational Grids." Campos do Jordão, SP, Brazil. 23. Mondal, A., Goda, K., and Kitsuregawa, M. (2003). "Effective Load-Balancing via Migration and Replication in Spatial Grids." Lecture Notes in Computer Science, 2736, 202-211. 24. Özsu, M. T., and Valduriez, P. (2001). "Principles of Distributed Database Systems." Prentice-Hall. 25. Porto, F., Silva, V. F. V., Dutra, M. L., and Shulze, B. (2005). "An adaptive distributed query processing grid service." Trondheim, Norway. 26. Ramirez, M. R. (2001) "Spatial Distributed Query Processing" Rio de Janeiro, RJ, COPPE/UFRJ. 27. Smith, J., Gounaris, A., Watson, P., Paton, N. W., Fernandes, A. A. A., and Sakellariou, R. (2002) "Distributed Query Processing on the Grid" 28. "OGSA-DQP 3.1 User's Documentation."(2006). http://www.ogsadai.org.uk/documentation/ogsa-dqp_3.1/, July/2006. 29. Venegupal, S., Buyya, R., and Winton, L. (2004). "A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids.". 30. Zimbrão, G., and Souza, J. M. (1998). "A Raster Approximation for the Processing of Spatial Joins." New York - USA, 558-569.