1 / 25

Seekable Sockets: A Mechanism to Reduce Copy Overheads

Explore a new pseudo-socket type in Linux kernel designed to improve TCP-based messaging performance, ideal for managing out-of-order messages. Examines theory, implementation, benchmarks, and future MPI enhancements.

Download Presentation

Seekable Sockets: A Mechanism to Reduce Copy Overheads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Seekable Sockets A Mechanism to Reduce Copy Overheads in TCP-based Messaging

  2. Motivation • MPI receives may occur out of order • These out of order messages must be managed in library space creating extra library to application copying overhead

  3. Contributions • A new pseudo-socket type in the Linux kernel for seeking through socket buffers • Performance is increased when: • Messages are expected to be very out of order • Message sizes are greater than about 1 KB • Only 823 lines of code added to the kernel, existing code is used extensively

  4. Contents • Theory • Implementation • Benchmarks • Further work on improving MPI • Conclusions

  5. Theory • Raw network communication is serial and in order of receipt • Access to the middle of a TCP stream of data is currently unsupported • What are the ramifications of managing a stream of data as a queue in user space applications?

  6. Theory • MPI_Recv(): • Check out of order pool for requested message (based on tagging) and return if it is there • Receive next message header • If it is the requested message, return • If not, receive into out of order pool • Repeat from second step

  7. Theory • When MPI_Recv() returns: • If the requested data was next in the stream (not in the pool of out of order messages), the data is copied directly from the kernel to the buffer • If the requested data was in the pool, the data was first copied from the kernel to the pool, and now it is copied from the pool to another buffer

  8. Theory • There are two inefficiencies in common MPI implementations • Each out of order message received is copied to a temporary buffer • Each temporary buffer is dynamically allocated for each individual message

  9. Theory • A method to seek into the socket buffer within the kernel would effectively eliminate both inefficiencies • Enter Seekable Sockets

  10. Implementation • Our research implements a new pseudo-socket to allow for seeking over traditional TCP sockets • Seekable sockets are processed by the normal TCP stack with modifications • Normal TCP communication flows through the TCP stack as usual

  11. Implementation • We have prototyped seekable sockets in Linux 2.6.13 • Most of the changes to the TCP stack are in the tcp_recvmsg() function (net/ipv4/tcp.c) • Other changes are mostly to implement the interface or to cover up corner cases • Managing wrapped sequence numbers • Coalescing skb’s when socket buffer is full

  12. Implementation • Data received across networks are placed into data buffers (skb’s in Linux) • Each packet is normally placed into its own skb • skb’s are added into a linked list, each list containing skb’s for one socket

  13. Implementation • Receive system calls on TCP sockets all funnel down to tcp_recvmsg() • Normally, tcp_recvmsg() pulls data off of the skb list until the requested buffer is filled • Seeking sockets can pull data from the middle of an skb list to fill the requested buffer

  14. Implementation • When a seeking receive call is performed, data is copied from the desired location to the user-space buffer and then the packet is deallocated from the linked list • A second, new linked list keeps track of the “holes” in the socket buffer

  15. Implementation • Blocking calls are allowed on seekable sockets • However, if the attempt to seek is beyond the length of the socket buffer and the socket buffer is full, an error is returned • The application must be aware of this and begin pulling data off the socket in order to free up socket resources

  16. Implementation • Socket buffer size limits are controlled through a sysctl and can be increased to any reasonable size • Sizes of individual buffers can be modified through setsockopt() • Setting these appropriately can increase the amount of seeking possible

  17. Benchmarks • Our research goals were to create an interface for seekable sockets and to see if any performance gains are possible • As this was an exploratory approach, we evaluated the method under a single benchmark to obtain an idea of where seekable sockets may be most beneficial

  18. Benchmarks • Our benchmark sent messages of a fixed size between two computers • The messages were transmitted out of order by a certain amount of messages • An out of orderness of N means N messages were sent before the desired message was sent • Repeat until perfomance stabilizes

  19. Benchmarks • In the normal, non-seeking benchmark, N out of order messages are copied into a temporary buffer • Then, the “requested” message is copied directly into the final buffer • Finally, the out of order messages are copied from the temporary buffer into the final buffer

  20. Benchmarks • In the seeking benchmark, all the out of order messages are seeked past one by one until the requested message is found • The requested message is then copied directly into the final buffer • Lastly, all out of order messages are copied into the final buffer

  21. Benchmarks • The out of orderness and the message sizes were varied • As message size and out of orderness increase, the overhead of user space to user space copying should increase • Performance is determined by amount of system and user processor time used for a run

  22. Benchmarks

  23. Implications for MPI • Seekable sockets may facilitate greater efficiency in TCP based MPI • However, implementation may require substantial library code changes • Proper handling of socket buffer full situations • Managing message headers and seek offsets

  24. Conclusions • Seekable sockets is a novel means of increasing efficiency when: • Out of order communication is expected • Message sizes are large • The MPI library could efficiently use seekable sockets, but may also require substantial code changes

  25. Polymorphic Message Management • Use of a polymorphic message structure within the MPI library could easily manage out of order messages • Helps manage known sequence numbers of out of order messages • Easily deal with socket buffer full conditions

More Related