530 likes | 541 Views
Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous Cluster Grid. Ching-Hsien Hsu ( 許慶賢 ). 中華大學資訊工程學系 http:// www.csie.chu.edu.tw. Outline. Introduction Regular / Irregular Data Distribution, Redistribution Category of Runtime Redistribution Problems
E N D
Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous Cluster Grid Ching-Hsien Hsu (許慶賢) 中華大學資訊工程學系 http:// www.csie.chu.edu.tw
Outline • Introduction • Regular / Irregular Data Distribution, Redistribution • Category of Runtime Redistribution Problems • Processor Mapping Technique for Communication Localization • The Processor Mapping Technique • Localization on Multi-Cluster Grid System • Scheduling Contention Free Communications for Irregular Problems • The Two-Phase Degree Reduction Method (TPDR) • Extended TPDR (E-TPDR) • Conclusions
Introduction Regular Parallel Data Distribution • Data Parallel Programming Language, e.g. HPF (High Performance Fortran), Fortran D, • BLOCK, CYCLIC, BLOCK-CYCLIC(c) • Ex. 18 Elements Array, 3 Logical Processors
Introduction (cont.) Data Distribution • Two Dimension Matrices
Introduction (cont.) Data Redistribution
Introduction (cont.) Data Redistribution REAL DIM(18, 24) :: A !HPF$ PROCESSORS P(2, 3) !HPF$ DISTRIBUTE A(BLOCK, BLOCK) ONTO P : (computation) !HPF$ REDISTRIBUTE A(CYCLIC, CYCLIC(2)) ONTO P : (computation)
Introduction (cont.) Irregular Redistribution PARAMETER (S = /7, 16, 11, 10, 7, 49/) !HPF$ PROCESSORS P(6) REAL A(100), new (6) !HPF$ DISTRIBUTE A (GEN_BLOCK(S)) onto P !HPF$ DYNAMIC new = /15, 16, 10, 16, 15, 28/ !HPF$ REDISTRIBUTE A (GEN_BLOCK(new))
Application … Algorithm P Algorithm Q … Data distribution for algorithm P 7 16 11 10 7 49 Data distribution for algorithm Q 15 16 10 16 15 28 Introduction (cont.) • Irregular Data Distribution (GEN_BLOCK) Heterogeneous Processors
Introduction (cont.) Problem Category • Benefits of runtime redistribution • Achieve Data Locality • Reduce Communication cost at runtime • Objectives • Indexing sets generation • Data Packing & Unpacking Techniques • Communication Optimizations • Multi-Stage Redistribution Method • Processor Mapping Technique • Communication Scheduling
Outline • Introduction • Regular / Irregular Data Distribution, Redistribution • Category of Runtime Redistribution Problems • Processor Mapping Technique for Communication Localization • The Processor Mapping Technique • Multi-Cluster Grid System • Contention Free Communication Scheduling for Irregular Problems • The Two-Phase Degree Reduction Method (TPDR) • Extended TPDR (E-TPDR) • Conclusions
Processor Mapping Technique The Original Processor Mapping Technique (Prof. Lionel. M. Ni) • Mapping function is provided to generate a new sequence of logical processor id • Increase data hits • Minimize the amount of data exchange
Processor Mapping Technique (cont.) An Optimal Processor Mapping Technique(Hsu’05) • Example: BC86 over 11 • Traditional Method • Size Oriented Greedy Matching • Maximum Matching (Optimal)
Processor Mapping Technique (cont.) Localize communications • Cluster Grid • Interior Communication • External Communication
Processor Mapping Technique (cont.) Motivating Example
Processor Mapping Technique (cont.) Communication Table Before Processor Mapping |I|=9 |E|=18
Processor Mapping Technique (cont.) Communication links Before Processor Mapping
Processor Mapping Technique (cont.) Communication table after Processor Mapping |I|=27 |E|=0
Processor Mapping Technique (cont.) Communication links after Processor Mapping
Processor Mapping Technique (cont.) Processor Reordering Flow Diagram Partitioning Data Source Data Alignment/Dispatch Master Node SCA(x) Determine Target Cluster SCA(x) SCA(x) Generate new Pid SD(Px) Designate Target Node Reordering SD(Px’) DCA(x) Mapping Function DCA(x) F(X) = X’ = +(X mod C) * K DCA(x) DD(Py) Reordering Agent
Processor Mapping Technique (cont.) Identical Cluster Grid vs. Non-identical Cluster Grid
Processor Mapping Technique (cont.) Processor Replacement Algorithm for Non-identical Cluster Grid
Processor Mapping Technique (cont.) Theoretical Analysis The number of interior communications when C=3.
Processor Mapping Technique (cont.) Theoretical Analysis
Processor Mapping Technique (cont.) Theoretical Analysis
Processor Mapping Technique (cont.) Simulation Setting • Taiwan UniGrid • 8 campus clusters • SPMD Programs • C+MPI codes.
National Tsing Hua University1 Academia Sinica National Tsing Hua University2 Taipei National Center for High-performance Computing Hsinchu Chung Hua University Providence University Hualien Tunghai University Taichung Hsing Kuo University National Dong Hwa University Tainan Processor Mapping Technique (cont.) Topology
NCHC Dual AMD 2000+, 512M SINICA Dual Intel P3 1.0, 1G CHU Intel P4 2.8, 256M NTHU Dual Xeon 2.8, 1G THU Dual AMD 1.6, 1G Internet NDHU AMD Athlon, 256M PU AMD 2400+, 1G HKU Intel P3 1.0, 256M Processor Mapping Technique (cont.) Hardware Infrastructure
Processor Mapping Technique (cont.) System Monitoring Webpage
Processor Mapping Technique (cont.) Experimental Results
Processor Mapping Technique (cont.) Experimental Results
Processor Mapping Technique (cont.) Experimental Results
Outline • Introduction • Regular / Irregular Data Distribution, Redistribution • Category of Runtime Redistribution Problems • Processor Mapping Technique for Communication Localization • The Processor Mapping Technique • Multi-Cluster Grid System • Scheduling Contention Free Communications for Irregular Problems • The Two-Phase Degree Reduction Method (TPDR) • Extended TPDR (E-TPDR) • Conclusions
Application … Algorithm P Algorithm Q … Data distribution for algorithm P 7 16 11 10 7 49 Data distribution for algorithm Q 15 16 10 16 15 28 Scheduling Irregular Redistributions Example of GEN_BLOCK distributions • Enhance load balancing on heterogeneous environment
Scheduling Irregular Redistributions (cont.) Example of GEN_BLOCK redistribution • Observation • Without cross communications
:Node : Data communication SP1 SP2 SP3 TP1 TP2 TP3 Scheduling Irregular Redistributions (cont.) Convex Bipartite Graph
A simple result. Scheduling Irregular Redistributions (cont.) Example of GEN_BLOCK redistribution Minimize communication step.Minimize the message size of total steps.
Related Implementations • Coloring
Related Implementations • LIST
Related Implementations • DC1 & DC2 (a)DC1 (b)DC2
Scheduling Irregular Redistributions (cont.) The Two Phase Degree Reduction Method • The First Phase (for nodes with degree >2) • Reduces degree of the maximum degree nodes by one in each reduction iteration. • The Second Phase(for nodes with degree = 1 and 2) • Schedules messages between nodes that with degree 1 and 2 using an adjustable coloring mechanism.
Scheduling Irregular Redistributions (cont.) The Two Phase Degree Reduction Method S3: m11(6)、m5(3) ----6 • The first phase
Scheduling Irregular Redistributions (cont.) The Two-Phase Degree Reduction Method S1:m1(7)、m3(7)、m6(15)、m8(4)、m10(8)、m13(18)---18S2:m2(3)、m4(4)、m7(3)、m9(10)、m12(12) ---12S3: m11(6)、m5(3) --- 6 • The second phase
Scheduling Irregular Redistributions (cont.) Extend TPDR S1:m1(7)、m3(7)、m6(15)、m8(4)、m10(8)、m13(18)---18S2:m2(3)、m4(4)、m7(3)、m9(10)、m12(12) ---12S3: m11(6)、m5(3) --- 6 TPDR S1: m1(7) 、m3(7) 、m6(15) 、 m9(10)、m13(18) ---18S2: m4(4) 、m7(3) 、 m10(8)、m12(12) ---12 S3: m11(6)、m5(3) 、m2(3) 、m8(4)----6 E-TPDR
Performance Evaluation Simulation of TPDR and E-TPDR algorithms on uneven cases.
Performance Evaluation (cont.) • Simulation A is carried out to examine the performance of TPDR and E-TPDR algorithms on uneven cases.
Performance Evaluation (cont.) • Simulation B is carried out to examine the performance of TPDR and E-TPDR algorithms on even cases.
Performance Evaluation (cont.) • Simulation B is carried out to examine the performance of TPDR and E-TPDR algorithms on even cases.
TPDR & E-TPDR for Scheduling irregular GEN_BLOCK redistributions Contention free Optimal Number of Communication Steps Outperforms the D&C algorithm TPDR(uneven) performs better than TPDR(even) Summary
Performance Evaluation (cont.) 1000 test cases
Performance Evaluation (cont.) Average