1 / 17

Congestion Control in a Reliable Scalable Message-Oriented Middleware

Congestion Control in a Reliable Scalable Message-Oriented Middleware. Middleware’03, Rio de Janeiro, Brazil, June 2003. B. B. B. B. B. Message-Oriented Middleware. Scalability Asynchronous communication and loose synchronisation Publish/Subscribe communication with filtering

rosannej
Download Presentation

Congestion Control in a Reliable Scalable Message-Oriented Middleware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Congestion Control in a Reliable Scalable Message-Oriented Middleware Middleware’03, Rio de Janeiro, Brazil, June 2003

  2. B B B B B Message-Oriented Middleware • Scalability • Asynchronous communication and loose synchronisation • Publish/Subscribe communication with filtering • Overlay network of message brokers • Reliability • Guaranteed delivery semantics for messages • Resend messages lost due to failure • Congestion • Publication rate may be too high  not enough capacity • Must guarantee stable behaviour of the system • Usually done with over-provisioning of the system  Congestion Control for Overlay Networks

  3. The Congestion Control Problem • Characteristics of a MOM • Large message buffers at brokers • Burstiness due to application-level routing • TCP CC only deals with inter-broker connections B B B Message Brokers App-level Queues Network            • Causes of Congestion • Under-provisioned system • Network bandwidth (congestion at output queues) • Broker processing capacity (congestion at input queues) • Additional resource requirement due to recovery

  4. Outline • Message-Oriented Middleware • The Congestion Control Problem • Gryphon • Congestion in Gryphon • Congestion Control Protocols • Publisher-Driven Congestion Control • Subscriber-Driven Congestion Control • Evaluation • Experimental Results • Conclusion

  5. IB IB IB IB PHB PHB SHB S P S P P S P S P P S S S S S S S P SHB SHB The Gryphon MOM • IBM’s MOM with publish/subscribe • Supports guaranteed in-order, exactly-once delivery • Brokers can be • Publisher-Hosting (PHB) • Subscriber-Hosting (SHB) • Intermediate (IB) • Clients connect to brokers • Publishers are aggregated to publishing endpoints (pubends) • Ordered stream of messages; maintained in persistent storage • NACKs for lost messages • IB’s cache stream data and satisfy NACKs

  6. PHB SHB1 SHB2 Congestion in Gryphon • Congestion due to recovery after link failure • System never recovers from unstable state 600 500 400 msgs (kb/s) IB 300 failure 200 100 link failure • Requirements of CC in MOM • Independent from particular MOM implementation • No/little involvement of intermediate brokers • Detect congestion before queue overflow occurs • Ensure that recovering SHBs will eventually catch up

  7. PHB SHB Congestion Control Protocols • Detect congestion in the system • Change in throughput used as a congestion metric • Reduction in throughput  queue build-up • Limit message rates to obtain stable behaviour • PHB-Driven CC Protocol (PDCC) • Feedback loop between pubends and downstream SHBs to monitor congestion • Limit publication rate of new messages to prevent congestion • SHB-Driven CC Protocol (SDCC) • Monitor rate of progress at a recovering SHB • Limit rate of NACKs during recovery

  8. PHB-Driven Congestion Control • Downstream Congestion Query Msgs (DCQ) • Trigger the congestion control mechanism • Periodically sent down the dissemination tree by pubend • Upstream Congestion Alert Msgs (UCA) • Indicate congestion in the system • SHBs observe their message throughput and respond with a UCA msg when congested • Cause pubend to reduce its publication rate • Properties • DCQ/UCA msgs treated as high-priority by brokers • Frequency of DCQ msg controls responsiveness of PDCC • No UCA msgs flow in an uncongested system • Similar to ATM ABR flow control

  9. IB PHB SHB Processing of DCQ/UCA Msgs • Publisher-Hosting Brokers • Hybrid additive/multiplicative increase/decrease scheme to change publication rate • Attempt to find optimal operating point • Intermediate Brokers • Aggregate UCA msgs to prevent feedback explosion • Pass up UCA msg from worst-congested SHB • Short-circuit first UCA msg for fast congestion notification • Subscriber-Hosting Brokers • Non-recovering brokers should receive msgs at the publication rate • Recovering brokers should receives msgs at a higher rate

  10. SHB-Driven Congestion Control • Important to restrict NACK rate • Small NACK msg can trigger many large data msgs • Mechanism to control degree of resources spent on resent messages during recovery (recovery time) • No support from other brokers necessary • SHBs maintain NACK window • Decide which parts of the message stream to NACK • Observe recovery rate • Open/close NACK window additively depending on rate change • Similar to CC in TCP Vegas

  11. Implementation in Gryphon • Gryphon’s message stream is subdivided into ticks • Discrete time interval that can hold a single message • 4 states: • Doubt Horizon: position in stream of first Q tick • Rate of progress of the DH as a congestion metric • Independent from filtering and actual publication rate doubt horizon time

  12. PHB SHB1 SHB2 Experimental Evaluation • Network of dedicated broker machines • Simple topology (4 brokers) • Complex topology (9 brokers; asymmetric paths) • Hundreds of publishing and subscribing clients • Large queue sizes to maximize throughput (5-25 Mb) • Congestion was created by • restricting bandwidth on inter-broker links • failing inter-broker links IB

  13. 800 PHB 700 SHB1 600 SHB2 500 400 300 200 100 0 Experiments I • Congestion due to recovery after link failure • PDCC reduces publication rate • SDCC keeps recovery rate steady msgs (kb/s) recovery link failure

  14. 700 PHB 600 SHB1 500 SHB2 400 UCA msg 300 1.2 200 1 100 0.8 0 0.6 0.4 Experiments II • Congestion due to dynamic b/w limits of IB-SHB1 link • Publication rate follows link bottleneck • UCA msgs are received at pubend msgs (kb/s) med b/w low b/w low b/w throughput ratio

  15. Conclusions • Reliable, content-based pub/sub needs congestion control • Characteristics different from traditional network cc • Publisher-driven and subscriber-driven congestion control • Distinguish between recovering and non-recovering brokers • Hybrid additive and multiplicative adjustment • Normalised rate regardless of publication rate • NACK window for controlled recovery • Future work • Fairness between many pubends in the same system • Dynamic adjustment of the DCQ rate

  16. Thank you Any Questions?

  17. Related Work • TCP Congestion Control • Point-to-point congestion control only • Throughput-based congestion metric • Reliable Multicast • Scalable feedback processing • Sender-based and receiver-based schemes • Feedback loops • Multicast ABR ATM • Forward and Backward Resource Management Cells • BRM cell consolidation at ATM switches • Overlay Networks • Little work done so far

More Related