240 likes | 346 Views
CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes. Gisele S. Craveiro, PhD Profa. Liria M. Sato, PhD CCGrid 2004 - Chicago. Outline. Introduction CPAR Parallel Programming Language CPAR-Cluster Tests and Results Conclusions. Introduction.
E N D
CPAR-Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes Gisele S. Craveiro, PhD Profa. Liria M. Sato, PhD CCGrid 2004 - Chicago
Outline • Introduction • CPAR Parallel Programming Language • CPAR-Cluster • Tests and Results • Conclusions CCGrid 2004 DSM Workshop
Introduction • Commodity clusters: • Idle machines, SMP nodes • Heterogeneity, programmability and good performance • Hybrid Models • message passing model + shared memory model CCGrid 2004 DSM Workshop
Introduction • CPAR • Parallel programming language • Shared memory programming model • CPAR-Cluster • Runtime system • Transparent access to shared variables over heterogeneous clusters • Scheduling during execution time CCGrid 2004 DSM Workshop
CPAR Parallel Programming Language • Parallel Blocks • Macrotasks • Microtasks • Shared Variables (global and local scopes) • Synchronization Mechanisms CCGrid 2004 DSM Workshop
Parallel Block Macrotask Microtask CPAR Parallelism Grains Cluster Node Processor CCGrid 2004 DSM Workshop
CPAR-Cluster Runtime System • DSM implemented in the compiler/library level • Consistency on each shared variable • Eager release consistency model • Write update coherence protocol CCGrid 2004 DSM Workshop
CPAR-Cluster Runtime System • Update distribution criteria • Total : all nodes • Central (Master): only one node will receive. • Macrotask scheduling • Microtask scheduling (loop scheduling) • static • dynamic CCGrid 2004 DSM Workshop
CPAR-Cluster Execution Model Master Slave 1 Slave 2 Slave N CCGrid 2004 DSM Workshop
Execution Model - Master Node Executor Shared Variables Comm. Sender CCGrid 2004 DSM Workshop
Sender Comm. Task Queue Executor Execution Model - Slave Node CCGrid 2004 DSM Workshop
Input Files • Hardware platform configuration file • CPAR program file • User task assignment file (optional) CCGrid 2004 DSM Workshop
Nodes Configuration File #comment line #master node sun cpu=4 #slaves nodes moon cpu=4 onix cpu=4 leo taurus1 taurus2 taurus3 orion CCGrid 2004 DSM Workshop
Task Pre Scheduling File #nodes suggestion init_A onix, leo, moon; #architecture suggestion Calc_B SMP; #node imposition multiply onix!; #architecture imposition, node suggestion tsp SMP! onix; CCGrid 2004 DSM Workshop
Sequential (parent) Parallel Microtask (parent+children) Slave (parent) Sequential (parent) Slave (child 1) Slave (child 3) Slave (child 2) Macrotask & Microtask Execution CPAR Parallel Macrotask Execution & Synchronization Coordination task body hello(){ printf(“Only parent”); forall i=1 to 4{ printf(“Everybody”); } printf(“Again,parent”); } CCGrid 2004 DSM Workshop
Tests - Hardware Platform 1 Intel Pentium II quad node 16 Intel Celeron nodes 8 AMD Athlon dual nodes Fast Ethernet CCGrid 2004 DSM Workshop
Tests Performed • Matrix Multiply • Shared variables with global scope (total update strategy). • Shared variables with global scope(centralized update strategy). • Without shared variables (no update overhead). • Travelling Salesman Problem CCGrid 2004 DSM Workshop
Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Results – MM (size 2000) Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Results – TSP 23 Cities Execution Time (s) Nodes CCGrid 2004 DSM Workshop
MM Omni+Score MM CPAR+CPAR-Cluster Execution Time (s) Nodes CCGrid 2004 DSM Workshop
Conclusions • CPAR-Cluster: • Tool implemented at library level, without kernel modifications or specific hardware. • Suitable behavior of shared variable update strategies • Data distribution criteria • Scheduling and load balancing • Exploration of computational power of mono and multiprocessor interconnected nodes CCGrid 2004 DSM Workshop
Questions? gisele.craveiro@poli.usp.br gisele.scraveiro@sp.senac.br CCGrid 2004 DSM Workshop