320 likes | 338 Views
Middleware for Active Reduction Operations in Distributed Systems. By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison. Spring 2000. Talk Outline. Motivation and Goals General Architecture of the middleware Components of the middleware
E N D
Middleware for Active Reduction Operations in Distributed Systems By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000
Talk Outline • Motivation and Goals • General Architecture of the middleware • Components of the middleware • Providing reliability - handling of node failures • Applications developed using the middleware • Performance • Conclusions and possible extensions Multicast / Reduction Trees
Motivation and Goals • A middleware for an application with Master - Worker paradigm • Scalableframework for communication and computing client response (“Reduction”) • Unicast does not scale - so use multicast • Introducing reduction operations dynamically in clients • A general framework for communication among clients Multicast / Reduction Trees
Sends queries Reduces results Hands back results to application Execute responses to queries Forward queries downstream Reduces incoming results Sends reduced results to master Executes responses to queries Sends back results towards master The Big Picture... Master App ARTL Client App Client App ARTL ARTL Client App ARTL Multicast / Reduction Trees
ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL specific message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees
ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL specific message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees
Communication Subsystem • Connection Setup • Connect nodes as a Binomial tree • Send and receive ARTL and application messages • Detect node failure and act accordingly • Integrate restarted node in current tree structure Multicast / Reduction Trees
Why use Binomial Tree Client App Client App Master App 3 2 1 2 Master App Client App Client App 1 2 Client App Client App Binomial Tree Query Propagation time = 2 Unicast Mechanism Query Propagation time = 3 Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Reduction Reduction at 5 and 3 Example Reduction operations: Min(), Max() Responses Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Tree connection setup Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Tree Setup - Phase I TCP connection setup Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Tree Setup - Phase II TCP connection setup Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Tree Setup - Phase III TCP connection setup Multicast / Reduction Trees
Inter node communication ARTL Header Data • Unicast and multicast data transmission • ARTL receives application messages for which no receive has been posted • these are sent to a callback function registered by application • ARTL receives data on behalf of application when application explicitly posts a receive Multicast / Reduction Trees
ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL Encapsulated message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees
Reduction Functions • Implemented as Shared objects • Sent to client during Setup phase • Each reduction function is associated with a particular response it reduces Multicast / Reduction Trees
Responses for the shaded entry from down stream nodes Reduced response sent upstream Table containing Query id and Callback information for currently registered queries Run Queue of reduction/response operations Response Callback Event Handler Network Thread Pool Event Handler Application Multicast / Reduction Trees
Multithreaded Architecture • No prior Knowledge about behavior of reduction function • Exploit concurrency - multiple processor per node • Static Pool of threads - Creation and destruction of threads is bad(Firefly RPC) Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Crash Reconfiguration Multicast / Reduction Trees
1 5 3 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees
1 5 3 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 2 Multicast / Reduction Trees
1 5 3 2 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees
1 3 2 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees
Crash Detection • Break in TCP connection with parent/child • a signal is received at the other end of connection • Use of periodic refresh messages to inform parent that child is up and running • useful in WAN environments Multicast / Reduction Trees
Crash Handling • Parent of node down informs master • All nodes are informed of a node failure • Master recomputes tree • If leaf node down, then no problem • If intermediate node down, some reconfiguration is required Multicast / Reduction Trees
Node Restart • Restarted node contacts master to tell it about restart • Master sends it current state of network and the shared object(s) • All nodes are informed of a node restart • Master recomputes tree and informs the new node’s parent about its new child • Parent and child establish connections Multicast / Reduction Trees
SysMon - A System monitor Monitors the load average from /procdisplays Min, Max and average loads Per-node load is also displayedARTL Reduction operations : Min, Max and Average Multicast / Reduction Trees
SysMon - A System monitor Node failures are detected and SysMon pops up an alert Multicast / Reduction Trees
File Transfer Application • Transfers a file from master to all clients • File can be executed at clients (if required) • execution can be instantaneous on receiving file • execution can be delayed until all nodes have received the file Multicast / Reduction Trees
File Transfer Performance Multicast / Reduction Trees
Total Startup Time vs Number of Nodes Client processes started using ssh on different machines Multicast / Reduction Trees
Conclusions and Extensions • A middleware for dynamic operations • Support for crash detection, recovery and dynamic processes • Demonstrated near optimal speedup using real applications • Making response function dynamic - active services • Differential scheduling in thread scheduler for QoS • Making dynamic code secure Multicast / Reduction Trees