250 likes | 302 Views
Seekable Sockets. A Mechanism to Reduce Copy Overheads in TCP-based Messaging. Motivation. MPI receives may occur out of order These out of order messages must be managed in library space creating extra library to application copying overhead. Contributions.
E N D
Seekable Sockets A Mechanism to Reduce Copy Overheads in TCP-based Messaging
Motivation • MPI receives may occur out of order • These out of order messages must be managed in library space creating extra library to application copying overhead
Contributions • A new pseudo-socket type in the Linux kernel for seeking through socket buffers • Performance is increased when: • Messages are expected to be very out of order • Message sizes are greater than about 1 KB • Only 823 lines of code added to the kernel, existing code is used extensively
Contents • Theory • Implementation • Benchmarks • Further work on improving MPI • Conclusions
Theory • Raw network communication is serial and in order of receipt • Access to the middle of a TCP stream of data is currently unsupported • What are the ramifications of managing a stream of data as a queue in user space applications?
Theory • MPI_Recv(): • Check out of order pool for requested message (based on tagging) and return if it is there • Receive next message header • If it is the requested message, return • If not, receive into out of order pool • Repeat from second step
Theory • When MPI_Recv() returns: • If the requested data was next in the stream (not in the pool of out of order messages), the data is copied directly from the kernel to the buffer • If the requested data was in the pool, the data was first copied from the kernel to the pool, and now it is copied from the pool to another buffer
Theory • There are two inefficiencies in common MPI implementations • Each out of order message received is copied to a temporary buffer • Each temporary buffer is dynamically allocated for each individual message
Theory • A method to seek into the socket buffer within the kernel would effectively eliminate both inefficiencies • Enter Seekable Sockets
Implementation • Our research implements a new pseudo-socket to allow for seeking over traditional TCP sockets • Seekable sockets are processed by the normal TCP stack with modifications • Normal TCP communication flows through the TCP stack as usual
Implementation • We have prototyped seekable sockets in Linux 2.6.13 • Most of the changes to the TCP stack are in the tcp_recvmsg() function (net/ipv4/tcp.c) • Other changes are mostly to implement the interface or to cover up corner cases • Managing wrapped sequence numbers • Coalescing skb’s when socket buffer is full
Implementation • Data received across networks are placed into data buffers (skb’s in Linux) • Each packet is normally placed into its own skb • skb’s are added into a linked list, each list containing skb’s for one socket
Implementation • Receive system calls on TCP sockets all funnel down to tcp_recvmsg() • Normally, tcp_recvmsg() pulls data off of the skb list until the requested buffer is filled • Seeking sockets can pull data from the middle of an skb list to fill the requested buffer
Implementation • When a seeking receive call is performed, data is copied from the desired location to the user-space buffer and then the packet is deallocated from the linked list • A second, new linked list keeps track of the “holes” in the socket buffer
Implementation • Blocking calls are allowed on seekable sockets • However, if the attempt to seek is beyond the length of the socket buffer and the socket buffer is full, an error is returned • The application must be aware of this and begin pulling data off the socket in order to free up socket resources
Implementation • Socket buffer size limits are controlled through a sysctl and can be increased to any reasonable size • Sizes of individual buffers can be modified through setsockopt() • Setting these appropriately can increase the amount of seeking possible
Benchmarks • Our research goals were to create an interface for seekable sockets and to see if any performance gains are possible • As this was an exploratory approach, we evaluated the method under a single benchmark to obtain an idea of where seekable sockets may be most beneficial
Benchmarks • Our benchmark sent messages of a fixed size between two computers • The messages were transmitted out of order by a certain amount of messages • An out of orderness of N means N messages were sent before the desired message was sent • Repeat until perfomance stabilizes
Benchmarks • In the normal, non-seeking benchmark, N out of order messages are copied into a temporary buffer • Then, the “requested” message is copied directly into the final buffer • Finally, the out of order messages are copied from the temporary buffer into the final buffer
Benchmarks • In the seeking benchmark, all the out of order messages are seeked past one by one until the requested message is found • The requested message is then copied directly into the final buffer • Lastly, all out of order messages are copied into the final buffer
Benchmarks • The out of orderness and the message sizes were varied • As message size and out of orderness increase, the overhead of user space to user space copying should increase • Performance is determined by amount of system and user processor time used for a run
Implications for MPI • Seekable sockets may facilitate greater efficiency in TCP based MPI • However, implementation may require substantial library code changes • Proper handling of socket buffer full situations • Managing message headers and seek offsets
Conclusions • Seekable sockets is a novel means of increasing efficiency when: • Out of order communication is expected • Message sizes are large • The MPI library could efficiently use seekable sockets, but may also require substantial code changes
Polymorphic Message Management • Use of a polymorphic message structure within the MPI library could easily manage out of order messages • Helps manage known sequence numbers of out of order messages • Easily deal with socket buffer full conditions