250 likes | 266 Views
GeoLoc : Robust Resource Allocation Method for Query Optimization in Data Grid Systems. Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN. Baltic DB&IS'2012. Table of contents. Introduction Existing methods classification Contributions Allocation Space Allocation Algorithm
E N D
GeoLoc:Robust Resource Allocation Method for Query Optimization in Data Grid Systems Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN Baltic DB&IS'2012
Table of contents • Introduction • Existing methods classification • Contributions • Allocation Space • Allocation Algorithm • Performance Evaluation • Conclusion
Introduction Data Grid • Heterogeneity • Dynamicity • Large Scale
Introduction Query processing Query execution Parsing Query rewrite Resource allocation Resource discovery
Introduction Problem Input: • Set of query operations (dependent) • Set of nodes • Distribution of Relations • Dynamic and Static characteristics of Data Grid Objectives: • Select optimal subset of nodes to allocate resources for query operations
Existing Methods Classification Control structure: Centralized Hierarchical Decentralized
Existing Methods Classification Algorithms: Heuristic Exact
Existing Methods Classification Static Strategies: Resource Allocation Execution Dynamic Resource Allocation Execution Hybrid Execution with Dynamic Reallocation Resource Allocation
Existing Methods Classification • Cooperation type: • Classic • Incentive-based • Economic / Reputation
Contributions • Allocation Space Restriction • Algorithm of Resource Allocation Parallelism: pipeline, intra-operation, inter-operation Distributed and duplicated relations
Allocation Space Source nodes Nearest nodes
Allocation Algorithm Assumptions • Each relation is distributed by N equal parts • Hybrid Hash Join algorithm • Results are being retransferred from the nodes • Memory is using for reducing I/O operations
Allocation Algorithm • Input: • All nodes with fragments of queried relations (1) • All nodes nearest to (1) Stage 1. Definition of Allocation Space CPU NET I/O Overall Node Bandwidth • Algorithm: • Selection of source nodes on the base of their performance • Placement of Scan operations • Generation of Allocation Space (source nodes + nearest nodes)
Allocation Algorithm • Input: • Query logic plan • Generated Allocation Space • Idea: • Parity in bandwidth between Scan and Join operations Stage 2. Generation of execution plan • Algorithm: • BEGIN • FOR each join DO • Count the time of source relations read and transferring, Tscan_exec • DO • Choose the most efficient node Neff from a set of AS for placing join operation • Add Neff to the join allocation plan, Pjoin • Estimate the execution time of join, Tjoin_exec • WHILE (Tjoin_exec > Tscan_exec) • Add Pjoin to the query allocation plan, Pquery • ENDFOR • END
Allocation Algorithm Query: R S R = R1U R2 S = S1U S2 R1: n1, n2 R2: n3, n4 S1: n5, n6 S2: n7, n8 Example n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm Query: R S R = R1U R2 S = S1U S2 R1:n1, n2 R2: n3, n4 S1: n5, n6 S2:n7, n8 Example n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm n25 n26 n14 n11 n12 n10 n19 n16 n13 n17 n15 n20 n18 n22 n24 n23 n21 Query: R S R = R1U R2 S = S1U S2 Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 Example n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm n25 n26 n14 n11 n12 n10 n19 n16 n13 n17 n15 n20 n18 n22 n24 n23 n21 Query: R S R = R1U R2 S = S1U S2 Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 Example n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm Source Nodes n25 n26 n18 n19 n12 n13 n10 Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 Resulted Execution Plan Scans: n1, n4, n7, n6 Joins: n18, n25, n10, n26, n13, n12, n19 n1 n4 n7 n6 Example Nodes’ Bandwidth: 2000 lines/sec Nodes allocated for Join Nodes’ Bandwidth: 1790 lines/sec 2000 lines/sec 1300 lines/sec 1500 lines/sec 1650 lines/sec 1920 lines/sec 900 lines/sec
Performance Evaluation Experimental conditions • Data Grid simulator • 6000 heterogeneous nodes • Simple, Average and Complex queries • Distributed and duplicated relations Comparison • Method GeoLoc • Method Gounaris2004
Performance Evaluation Optimization Time
Performance Evaluation Response Time
Conclusion Proposed method is: • Efficient • Scalable • Adapted to heterogeneous decentralized Data Grid Perspective: • Adaptation to the Dynamicity of Data Grid