1 / 20

SPUD A Distributed High Performance Publish-Subscribe Cluster

SPUD A Distributed High Performance Publish-Subscribe Cluster. Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion. Project Goal. Design and implement a general-purpose Publish-Subscribe server

Download Presentation

SPUD A Distributed High Performance Publish-Subscribe Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPUDA Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion

  2. Project Goal • Design and implement a general-purpose Publish-Subscribe server • Push traditional implementations into global scale performance demands • 1 million concurrent clients • Millions of concurrent topics • High transaction rate • Demonstrate server abilities with a fun client application

  3. topic://traffic-jams/ayalon What is Pub/Sub? accidentin hashalom publish subscribe accidentin hashalom

  4. What Can We Do With It?Collaborative Web Browsing others: others:

  5. What Can We Do With It?Instant Messaging Hi buddy! Hi buddy!

  6. Seems Easy To Implement, But… • “I’m behind a NAT, I can’t connect!” • Not all client setups are server friendly • “Server is too busy, try again later?!” • 1 million concurrent clients is simply too much • “The server is so slow!!!” • Service time grows exponentially with load • “A server crashed, everything is lost!” • Single points of failure will eventually fail

  7. NAT Naïve Implementation(example 1) • Simple UDP for client-server communication • No need for sessions since we send messages • Very low cost-per-client • Sounds perfect?

  8. NAT Traversal • UDP hole punching • NAT will accept UDP reply for a short window • Our measurements: 15-30 seconds • Keep UDP pinging from each client every 15s • Days-long TCP sessions • NAT remembers current sessions for replies • If WWW works - we should work • Increases dramatically cost-per-client • Our research: all IM’s do exactly this

  9. 500 clients 500 clients 500 clients Naïve Implementation(example 2) • Blocking I/O with one thread per client • Basic model for most servers (JAVA default) • Traditional UNIX – fork for every client • Sounds perfect?

  10. Network I/O Internals • Blocking I/O – one thread per client • 2MB stack, 1GB virtual space enough for only 512 (!) • Non-blocking I/O - select • Linear fd searches are very slow • Asynchronous I/O – completion ports • Thread pool to handle request completion • Our measurements: 30,000 concurrent clients! • What is the bottleneck? • Number of locked pages (zero-byte receives) • TCP/IP kernel driver non-paged pool allocations

  11. Scalability • Scale up • Buy a bigger box • Scale out • Buy more boxes • Which one to do? • Both! • Push each box to its hardware maximum • 1000’s of servers is impractical • Add relevant boxes as load increases • The Google way (cheap PC server farms)

  12. Identify Our Load Factors • Concurrent TCP clients • Scale up: async-I/O, 0-byte-recv, larger NPP • Scale out: dedicate boxes to handle clients=> Connection Server (CS) • High transaction throughput (topic load) • Scale up: software optimizations • Scale out: dedicate boxes to handle topics => Topic Server (TS) • Design the cluster accordingly

  13. Network Architecture

  14. Client Load Balancing load balance: - user location - CS client load CS1 given CS2 CLB TS1 CS2 TS2 request CS login publish subscribe CS3

  15. Topic Load BalancingStatic Room 0 TS0 TS1 TS2 subscribe: 923481%4=1 CS subscribe: traffic TS3

  16. Topic Load BalancingDynamic Room 1 handle subscribe subscribe R0: 345K R1: 278K R2: ? TS1 Room 0 TS1 TS1 subscribe R0: 345K R1: 278K R2: 301K subscribe R0: 345K R1: ? R2: ? subscribe R1: 278K Room 2 subscribe CS

  17. Performance Pitfalls • Data Copies • Single instance - reference counting (REF_BLOCK) • Multi-buffer messages (MESSAGE: header, body, tail) • Context Switches • Flexible module exec foundation (MODULE) • Processor num sized thread pools • Memory Allocation • MM: custom memory pools (POOL, POOL_BLOCK) • fine-grained locking, pre-allocation, batching, single-size • Lock Contention • EVENT, MUTEX, RW_MUTEX, interlocked API

  18. Class Diagram (Application)

  19. Class Diagram (TS, CS)

  20. Stress Testing • Measure publish-notify turnaround time • 1 ms resolution using MM timer, avg. of 30 • Increasing client and/or topic load • Several room topologies examined • Results: • Exponential-like climb • TS increase: better times • CS increase: better max clients time not improved

More Related