1 / 32

By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison

Middleware for Active Reduction Operations in Distributed Systems. By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison. Spring 2000. Talk Outline. Motivation and Goals General Architecture of the middleware Components of the middleware

yvetted
Download Presentation

By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Middleware for Active Reduction Operations in Distributed Systems By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000

  2. Talk Outline • Motivation and Goals • General Architecture of the middleware • Components of the middleware • Providing reliability - handling of node failures • Applications developed using the middleware • Performance • Conclusions and possible extensions Multicast / Reduction Trees

  3. Motivation and Goals • A middleware for an application with Master - Worker paradigm • Scalableframework for communication and computing client response (“Reduction”) • Unicast does not scale - so use multicast • Introducing reduction operations dynamically in clients • A general framework for communication among clients Multicast / Reduction Trees

  4. Sends queries Reduces results Hands back results to application Execute responses to queries Forward queries downstream Reduces incoming results Sends reduced results to master Executes responses to queries Sends back results towards master The Big Picture... Master App ARTL Client App Client App ARTL ARTL Client App ARTL Multicast / Reduction Trees

  5. ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL specific message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees

  6. ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL specific message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees

  7. Communication Subsystem • Connection Setup • Connect nodes as a Binomial tree • Send and receive ARTL and application messages • Detect node failure and act accordingly • Integrate restarted node in current tree structure Multicast / Reduction Trees

  8. Why use Binomial Tree Client App Client App Master App 3 2 1 2 Master App Client App Client App 1 2 Client App Client App Binomial Tree Query Propagation time = 2 Unicast Mechanism Query Propagation time = 3 Multicast / Reduction Trees

  9. 1 5 3 2 7 6 8 4 Reduction Reduction at 5 and 3 Example Reduction operations: Min(), Max() Responses Multicast / Reduction Trees

  10. 1 5 3 2 7 6 8 4 Tree connection setup Multicast / Reduction Trees

  11. 1 5 3 2 7 6 8 4 Tree Setup - Phase I TCP connection setup Multicast / Reduction Trees

  12. 1 5 3 2 7 6 8 4 Tree Setup - Phase II TCP connection setup Multicast / Reduction Trees

  13. 1 5 3 2 7 6 8 4 Tree Setup - Phase III TCP connection setup Multicast / Reduction Trees

  14. Inter node communication ARTL Header Data • Unicast and multicast data transmission • ARTL receives application messages for which no receive has been posted • these are sent to a callback function registered by application • ARTL receives data on behalf of application when application explicitly posts a receive Multicast / Reduction Trees

  15. ART - Library Architecture Application specific callbacks Application Application API Reduction functions Framework for processing messages ARTL Encapsulated message Event Handler Outgoing message ARTL Communication Layer Incoming Packet Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees

  16. Reduction Functions • Implemented as Shared objects • Sent to client during Setup phase • Each reduction function is associated with a particular response it reduces Multicast / Reduction Trees

  17. Responses for the shaded entry from down stream nodes Reduced response sent upstream Table containing Query id and Callback information for currently registered queries Run Queue of reduction/response operations Response Callback Event Handler Network Thread Pool Event Handler Application Multicast / Reduction Trees

  18. Multithreaded Architecture • No prior Knowledge about behavior of reduction function • Exploit concurrency - multiple processor per node • Static Pool of threads - Creation and destruction of threads is bad(Firefly RPC) Multicast / Reduction Trees

  19. 1 5 3 2 7 6 8 4 Crash Reconfiguration Multicast / Reduction Trees

  20. 1 5 3 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees

  21. 1 5 3 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 2 Multicast / Reduction Trees

  22. 1 5 3 2 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees

  23. 1 3 2 7 6 8 4 Crash Reconfiguration Crash Reconfiguration at depth 1 Multicast / Reduction Trees

  24. Crash Detection • Break in TCP connection with parent/child • a signal is received at the other end of connection • Use of periodic refresh messages to inform parent that child is up and running • useful in WAN environments Multicast / Reduction Trees

  25. Crash Handling • Parent of node down informs master • All nodes are informed of a node failure • Master recomputes tree • If leaf node down, then no problem • If intermediate node down, some reconfiguration is required Multicast / Reduction Trees

  26. Node Restart • Restarted node contacts master to tell it about restart • Master sends it current state of network and the shared object(s) • All nodes are informed of a node restart • Master recomputes tree and informs the new node’s parent about its new child • Parent and child establish connections Multicast / Reduction Trees

  27. SysMon - A System monitor Monitors the load average from /procdisplays Min, Max and average loads Per-node load is also displayedARTL Reduction operations : Min, Max and Average Multicast / Reduction Trees

  28. SysMon - A System monitor Node failures are detected and SysMon pops up an alert Multicast / Reduction Trees

  29. File Transfer Application • Transfers a file from master to all clients • File can be executed at clients (if required) • execution can be instantaneous on receiving file • execution can be delayed until all nodes have received the file Multicast / Reduction Trees

  30. File Transfer Performance Multicast / Reduction Trees

  31. Total Startup Time vs Number of Nodes Client processes started using ssh on different machines Multicast / Reduction Trees

  32. Conclusions and Extensions • A middleware for dynamic operations • Support for crash detection, recovery and dynamic processes • Demonstrated near optimal speedup using real applications • Making response function dynamic - active services • Differential scheduling in thread scheduler for QoS • Making dynamic code secure Multicast / Reduction Trees

More Related