1 / 17

U-Net: A User-Level Network Interface for Parallel and Distributed Computing

U-Net: A User-Level Network Interface for Parallel and Distributed Computing. T. von Eicken, A. Basu, V. Buch and W. Vogels Cornell University Appears in SIGOPS 1995 Presented by: Joseph Paris. Introduction. There has been a shift in local area network bottleneck

tommy
Download Presentation

U-Net: A User-Level Network Interface for Parallel and Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. U-Net: A User-Level Network Interface for Parallel and Distributed Computing T. von Eicken, A. Basu, V. Buch and W. Vogels Cornell University Appears in SIGOPS 1995 Presented by: Joseph Paris

  2. Introduction • There has been a shift in local area network bottleneck • Traditionally, limited bandwidth • Now we see an issue in the message path through software • Taking a look at the UNIX networking architecture • Message path through the kernel consists of • Several Copies • Crossing multiple levels of abstraction between device drivers and user applications • Resulting in…. Overhead • We observe that the processing overheads limit the peak communication bandwidth and result in high latency • So, the upgrades in networking technology largely go unnoticed to the general user community • Vendor supplied problem? • May think of large data-stream cases and less about per message overhead

  3. Observation • Most applications use relatively small messages and rely heavily on quick round-trip requests and replies • Distributed shared memory • Remote procedure calls • Remote object-oriented method invocations • Distributed cooperative file caches • And, they could also benefit from more flexible interfaces to the network • Traditional architecture cannot easily support new protocols/interfaces • Integrating application specific information into protocol processing • Higher efficiency • Greater flexibility • I.e. Video, Audio, Transferring directly from data structures

  4. Motivation Low end-to-end communication latencies • Separating processing overhead from network latency • Distributed Systems • Object-oriented Technology • Objects are generally small (100 bytes vs. Kbytes) • Electronic workplace • Simple database servers that handle object naming, location, authentication, protection. (20-80bytes for requests, 40-200 bytes for response) • Cache Coherence • Keeping copies consistent introduces a large number of small coherence messages. • Fault-tolerance Algorithms/Group Communication • Global locks, scheduling, coherence • RPC’s, file systems, etc.

  5. Motivation • Small message Bandwidth • Same trends that demand low latencies also demand high bandwidth for small messages • Object-oriented Technology, Electronic workplace, Cache Coherence, RPC’s, etc • Part of decreasing the overall end-to-end latency is having high-bandwidth technology for small messages • Basically, we want full network bandwidth with as small messages as possible • Protocol Interface Flexibility • Traditionally • protocol stacks are implemented as part of the kernel • Lack of integration of kernel and application buffer management • Solution • Remove the comm. Subsystem’s boundary with the application specific protocols • Tight coupling between the comm. Protocol and the application

  6. Solution - Unet • Why? • Focus on low latency and high bandwidth using small messages • Emphasis on protocol design and integration flexibility • Desire to meet goals on widely available ‘off the shelf’ hardware • How? • Simply, remove the kernel from the critical path of sending and receiving messages • Eliminates the system call overhead • Offers opportunities to streamline the buffer management • What’s required? • Virtualizing the network interface among processes • Protection such that processes using the network cannot interfere with each other • Message Multiplexing and De-Multiplexing • Managing communication resources without the kernel • Efficient and Versatile programming interface to the network

  7. Design & Implementation of U-Net • Virtualize the network interface in such a way that a combination of OS and hardware mechanisms can provide the illusion of owning the interface • In hardware • Components manipulated by a process correspond to real hardware • In software • Memory locations are interpreted by the OS • Both • The Role of U-Net is limited to • Multiplexing the actual network interface among all processes • Enforcing protection boundaries • Enforcing consumption limits • This leaves the process with control over • Contents of the message • Management of send and receive resources (such as buffers)

  8. Design & Implementation of U-Net • We have 3 main building blocks • Endpoints • Serve as an applications handle into the network and contain… • Communication Segments • Regions of memory that hold message data • Message Queues • Holds descriptors for messages that are to be sent or have been received • Each process that wants to access the network • Creates one or more endpoints • Associates a communication Segment with the endpoint • And a set of send, receive, free message queues

  9. Design & Implementation of U-Net • Sending • User process composes the data in the communication segment • Pushes a descriptor for the message onto the send queue • At this point the network interface is expected to pick the message up and insert it into the network • If there is a back-up • Leave the descriptor in the queue • Eventually exert back-pressure to the user process when the queue becomes full • Receiving • Messages are de-multiplexed based on their destination • Data is transferred to the appropriate comm. Segment • The message descriptor is pushed onto the corresponding receive queue • Receive model notification • Polling • Blocks waiting for the next message to arrive via the UNIX select call • Event Driven • Register an Up-Call • Signals the state of the receive queue that satisfies a certain condition • Only two conditions currently supported • Queue is non-empty • Queue is almost full • In order to keep performance high (and cost low) all messages can be consumed on a single up-call

  10. Design and Implementation of U-Net • Multiplexing and De-Multiplexing Messages • Uses a tag in each incoming message to determine • destination endpoint • Comm. Segment • Message queue descriptor • Exact form of the message tag depends on the network substrate • i.e. ATM uses virtual channel identifiers • Getting the tag via an OS level service assists in • An application in determining the correct tag to use based on a specification of the destination process and the route between the two nodes • route discovery • Switch-path setup • other signaling that is specific to the network technology • Authentication and authorization • Performs checks to ensure that the application is allowed to access specific network resources • Also checks to make sure there are no conflicts with other applications

  11. Design and Implementation of U-Net • Base-level Architecture • Hardware cannot support Direct-Access • “True Zero-Copy” where data can be sent directly out of the applications data structures without intermediate buffering • Requires special memory mapping to span the entire processes address space into the network interface • So we only get “Zero-Copy” support for now • Which in reality requires a single copy, namely between the application’s data structures and a bugger in the communication segment • Queue based interface to the network • Stages messages in a limited size comm. Segment on their way between application data structures and the network • Send and Receive queues hold descriptors with information about the destination, origin, endpoints, length, as well as offsets within the comm. segment • Management of the send buffer is entirely up to the process • Must be properly aligned for the requirements of the network interface • Cannot control order in which messages are received into the Recv Buffer • Free queues hold descriptors for free buggers that are made available to the network interface for storing arriving messages • Small Message Optimization • Send and recv queues may hold entire messages in descriptors (instead of pointers to data) • Avoids buffer management and can improve round-trip latency

  12. Evaluation • Two U-Net implementations • SBA-100 • Non-programmable, completely done in software • Performance sucks • 33-40% increase in overhead due to ATM header CRC calculation being done in software • SBA-200 • Programmable, custom firmware • Reflects the base-level U-Net architecture in hardware • Three tests • U-Net Active Messages Implementation (UAM) • Active messages is a mechanism that allows efficient over-lapping of communication with computation in multi-processors • Communication in form of requests and matching replies • Split-C • Parallel extension to C for programming distributed memory machines using a global address space abstraction • Comprises of one thread of control per process from a single code image and the threads interact through reads and writes on shared data • Implemented with U-Net Active Messages • TCP/UDP

  13. Evaluation • Active Message (UAM) U-Net bandwidth as a function of message size U-Net round-trip time as a function of message size

  14. Evaluation • Split-C Using UAM Overall Execution Time CPU and Network Breakdown for two applications

  15. Evaluation • TCP/UDP U-Net TCP Bandwidth as a function of Message Size U-Net UDP Bandwidth as a function of Message Size

  16. Evaluation • TCP/UDP Latency as a function of Message Size

  17. Conclusion • Processing overhead on messages has been minimized • Latency experienced by the application is once again dominated by the actual message transmission time • Simple networking interface that supports traditional inter-networking protocols and abstractions such as Active Messages • Demonstrates that removing the kernel from the communication path can offer new flexibility in addition to high performance • TCP/UDP protocols achieve latencies and throughput close to the raw maximum

More Related