1 / 23

CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes

CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes. Gisele S. Craveiro, PhD Profa. Liria M. Sato, PhD CCGrid 2004 - Chicago. Outline. Introduction CPAR Parallel Programming Language CPAR-Cluster Tests and Results Conclusions. Introduction.

kelii
Download Presentation

CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes Gisele S. Craveiro, PhD Profa. Liria M. Sato, PhD CCGrid 2004 - Chicago

  2. Outline • Introduction • CPAR Parallel Programming Language • CPAR-Cluster • Tests and Results • Conclusions CCGrid 2004 DSM Workshop

  3. Introduction • Commodity clusters: • Idle machines, SMP nodes • Heterogeneity, programmability and good performance • Hybrid Models • message passing model + shared memory model CCGrid 2004 DSM Workshop

  4. Introduction • CPAR • Parallel programming language • Shared memory programming model • CPAR-Cluster • Runtime system • Transparent access to shared variables over heterogeneous clusters • Scheduling during execution time CCGrid 2004 DSM Workshop

  5. CPAR Parallel Programming Language • Parallel Blocks • Macrotasks • Microtasks • Shared Variables (global and local scopes) • Synchronization Mechanisms CCGrid 2004 DSM Workshop

  6. Parallel Block Macrotask Microtask CPAR Parallelism Grains Cluster Node Processor CCGrid 2004 DSM Workshop

  7. CPAR-Cluster Runtime System • DSM implemented in the compiler/library level • Consistency on each shared variable • Eager release consistency model • Write update coherence protocol CCGrid 2004 DSM Workshop

  8. CPAR-Cluster Runtime System • Update distribution criteria • Total : all nodes • Central (Master): only one node will receive. • Macrotask scheduling • Microtask scheduling (loop scheduling) • static • dynamic CCGrid 2004 DSM Workshop

  9. CPAR-Cluster Execution Model Master Slave 1 Slave 2 Slave N CCGrid 2004 DSM Workshop

  10. Execution Model - Master Node Executor Shared Variables Comm. Sender CCGrid 2004 DSM Workshop

  11. Sender Comm. Task Queue Executor Execution Model - Slave Node CCGrid 2004 DSM Workshop

  12. Input Files • Hardware platform configuration file • CPAR program file • User task assignment file (optional) CCGrid 2004 DSM Workshop

  13. Nodes Configuration File #comment line #master node sun cpu=4 #slaves nodes moon cpu=4 onix cpu=4 leo taurus1 taurus2 taurus3 orion CCGrid 2004 DSM Workshop

  14. Task Pre Scheduling File #nodes suggestion init_A onix, leo, moon; #architecture suggestion Calc_B SMP; #node imposition multiply onix!; #architecture imposition, node suggestion tsp SMP! onix; CCGrid 2004 DSM Workshop

  15. Sequential (parent) Parallel Microtask (parent+children) Slave (parent) Sequential (parent) Slave (child 1) Slave (child 3) Slave (child 2) Macrotask & Microtask Execution CPAR Parallel Macrotask Execution & Synchronization Coordination task body hello(){ printf(“Only parent”); forall i=1 to 4{ printf(“Everybody”); } printf(“Again,parent”); } CCGrid 2004 DSM Workshop

  16. Tests - Hardware Platform 1 Intel Pentium II quad node 16 Intel Celeron nodes 8 AMD Athlon dual nodes Fast Ethernet CCGrid 2004 DSM Workshop

  17. Tests Performed • Matrix Multiply • Shared variables with global scope (total update strategy). • Shared variables with global scope(centralized update strategy). • Without shared variables (no update overhead). • Travelling Salesman Problem CCGrid 2004 DSM Workshop

  18. Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop

  19. Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop

  20. Results – TSP 23 Cities Execution Time (s) Nodes CCGrid 2004 DSM Workshop

  21. MM Omni+Score MM CPAR+CPAR-Cluster Execution Time (s) Nodes CCGrid 2004 DSM Workshop

  22. Conclusions • CPAR-Cluster: • Tool implemented at library level, without kernel modifications or specific hardware. • Suitable behavior of shared variable update strategies • Data distribution criteria • Scheduling and load balancing • Exploration of computational power of mono and multiprocessor interconnected nodes CCGrid 2004 DSM Workshop

  23. Questions? gisele.craveiro@poli.usp.br gisele.scraveiro@sp.senac.br CCGrid 2004 DSM Workshop

More Related