1 / 33

PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation

Resource Allocation Algorithms for Event-Based Enterprise Systems. PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011. MIDDLEWARE SYSTEMS. RESEARCH GROUP. publisher. subscriber. subscriber.

raja
Download Presentation

PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Allocation Algorithms for Event-Based Enterprise Systems PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011 MIDDLEWARE SYSTEMS RESEARCH GROUP

  2. publisher subscriber subscriber brand = ‘Honda’ cashback = $6000 brand= ‘Honda’ cashback > $4000 Introduction to Distributed Content-based Publish/Subscribe broker brand = ‘Honda’ cashback > $2000 brand = ‘Honda’ cashback >= $0 multicast Advertisement path Subscription path Publication path

  3. Desirable Properties of Distributed Content-based Publish/Subscribe • Decoupling of data sources and sinks • Ease of component addition and removal • Flexible routing based on message content • Efficient use of network resources • Distributed broker overlay network • Scalable • Fault tolerant

  4. Applications of Publish/Subscribe • Network and systems monitoring [Mukherjee 1994] • Business activity monitoring [Fawcett et al. 1999] • Business process execution [Schuler et al. 2001] • Workflow management [Cugola et al. 2001] • Multiplayer online games [Bharambe et al. 2002] • RSS filtering [Petrovic et al. 2005; Rose et al. 2007] • Automated service composition [Hu et al. 2008] • Resource discovery [Yan et al. 2009]

  5. Real Deployments of Distributed Publish/Subscribe • GooPS • Google’s pub/sub messaging middleware to integrate web applications (such as Gmail, Google Docs, Google Calendar) on a world-wide scale supporting millions of users • Hundreds of brokers with tens of thousands of pub/sub clients • Yahoo Message Broker • Yahoo’s pub/sub middleware to integrate applications with their database system, PNUTS • SuperMontage • Tibco’s pub/sub distribution network for Nasdaq’s quote and order-processing system • GDSN (Global Data Synchronization Network) • A global pub/sub network that allows retailers and suppliers (i.e., Walmart, Target, Metro, etc.) to exchange timely and accurate supply chain data

  6. Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)

  7. Problem • Brokers located at different geographical areas may suffer from uneven load distribution due to • Heterogeneous servers • Network congestion • Different densities and interests of end-users • Consequences • Overloaded brokers introduce high delivery delays that may ultimately crash from running out of memory • System that does not scale with the added resources

  8. Visualizing the Problem S P S S S S

  9. load-accepting broker offloading broker Overview of Load Balancing Approach S P S Local Load Balancing Global Load Balancing S S S

  10. B12 B22 B32 B62 B52 B42 Evaluation P P B10 B20 B30 B60 B50 B40 P • Implemented on a real open source pub/sub system called PADRES • PlanetLab and a cluster testbed • Local and global load balancing • Homogeneous and heterogeneous servers • Compared against a naive approach S B11 B21 B31 B61 B41 B51 S S Global LB Setup

  11. Summary • Load balancing enables the pub/sub system to scale with the number of resources • Load balancing solutions that are unaware of subscription load and relationships are ineffective • Long response time • Unstable system

  12. Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)

  13. Problem P • Publishers can join anywhere or to the closest broker in the overlay • Consequences • High delivery delay • Sluggish system • High resource usage in terms of matching, network bandwidth, and subscription storage • High IT costs S S

  14. Approach P • Adaptively move publisher to area of matching subscribers • Two unique solutions • POP (Publisher Optimistic Placement) • Decision is based on the average number of downstream publication deliveries • GRAPE (Greedy Relocation Algorithm for Publishers of Events) • Decision is based on the end-to-end delivery delay, total broker message rate, and user specified inputs including the minimization metric (load/delivery delay) and weight S S

  15. Reduced message rate by up to 85% Evaluation • Implemented on the open source pub/sub system called PADRES • PlanetLab and a cluster testbed • Enterprise and random workloads Reduced delivery delay by up to 68%

  16. Summary • POP is suitable for pub/sub systems that strive for simplicity, such as GooPS • GRAPE is suitable for systems that strive to minimize in the extremes, such as system load in sensor networks or delivery delay in SuperMontage

  17. Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)

  18. Problem • What is the deployment strategy for the broker overlay, publisher assignment, and subscriber assignment to minimize the broker message rate and number of allocated brokers? • Proven to be an NP-complete problem • Benefits • Increase capacity of the system • More efficient energy usage of the allocated servers • Fewer servers mean lower investment and maintenance costs • Inline with Green IT, which is also what enterprises such as Google and Yahoo are currently engaged in

  19. Approach • 3 phase design . • Most compelling properties • Language independent • Content-based (XPath, regex, ranged, SQL, composite subscriptions, etc.) and topic-based, such as GooPS • Works effectively under any workload (defined or undefined)

  20. Phase 1: Subscription Profiling Message ID Profile of each subscriber per advertisement maintained at the subscriber’s first broker B34-M213 B34-M215 Message ID of first index B34-M213 Start of bit vector B34-M216 Publications delivered to subscription 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 B34-M217 Fixed size so shift left if next publication is out of bit vector range B34-M220 Cardinality of bit vector corresponds to bandwidth requirement of the subscription B34-M222 Used to compute “closeness” of between any two subscriptions in the clustering algorithm. closeness = |si∩ sj| B34-M225 B34-M226

  21. Phase 2: Subscription Allocation Algorithms • MANUAL/(AUTOMATIC) • Tree with fanout of 2, manual (random) placement of clients • Fastest Broker First (FBF) • Assign subscriptions randomly to the next most powerful broker • Bin Packing • Like FBF, but assigns the next highest traffic subscription • PAIRWISE-N, PAIRWISE-K (related approaches in ICDCS’02) • Subscription clustering where the number of clusters is given • CRAM (Clustering with Resource Awareness and Minimization) • Dynamically determines the number of clusters • Utilizes a new clustering algorithm that is more effective • Evaluated with 4 different subscription closeness metrics, with one derived from Banavar et al. in ICDCS '99

  22. Bin Packing S S S S S S

  23. Bin Packing’s Allocation Result S S S S S S

  24. Phase 3: Broker Overlay Construction S S S S S S S S S

  25. Bin Packing’s Final Overlay P P (( GRAPE )) (( GRAPE )) S S S S S S S S S

  26. Evaluation • Implemented on the PADRES open source content-based pub/sub project • Evaluated on a cluster testbed with 80 brokers • Evaluated on SciNet, an HPC with 1000 brokers • Comparison against two related works (Riabov et al. ICDCS’02, Banavar et al. ICDCS’99) • Homogeneous and heterogeneous scenarios • Workload saturates the initial deployment (MANUAL)

  27. Evaluation Results on SciNet Reduced message rate by up to 92% Reduced number of allocated brokers by up to 91%

  28. Summary • CRAM combines the benefits of • Subscription clustering • Resource awareness from Bin Packing by simultaneously reducing both • Broker message rates • Number of allocated brokers • Bit vectors are powerful • Language independent (XPath, regex, topics) • Effective with any workload distribution

  29. Conclusions • Load balancing increases • Availability by circumventing overloads • Scalability of the system • Publisher placement algorithms reduce • Broker input load by up to 68% • Broker message rate by up to 85% • Delivery delay by up to 68% • Resource allocation algorithms reduce • Average broker message rate by up to 92% • Number of allocated brokers by up to 91%

  30. Future Work • Self-tuning of load balancing parameters • React dynamically by growing and shrinking the network in incremental steps • Improve runtime of the CRAM algorithm by parallelization or reducing its computational complexity • Model workload with more sophisticated methods, such as stochastic processes, to improve accuracy of load estimation • Address fault resiliency in each approach

  31. Q & A

  32. Related Works - Clustering • Riabovet al. (ICDCS’02) • The number of clusters K is pre-specified • Each cluster is a multicast address, thus there is no upper limit on its size • Event space is divided into grids • Supports only ranged subscriptions • Their pairwise clustering considers each subscription individually • Gryphon (ICDCS'99) • Supports only equal and * subscriptions • Each cluster is stored in memory, the upper bound limit is not a major concern • SUB-2-SUB (IPTPS'06) • Supports only ranged subscriptions • Each cluster is a p2p network, thus there is no upper limit on the cluster size

  33. Related Works – Broker Overlay Construction, Publisher and Subscriber Placement Algorithms • Baldoni et al. (The Computer Journal), • Jaeger et al. (SAC'07) • Migliavacca et al. (DEBS’07) • Reconfigure broker overlay to reduce delivery delay and broker processing load • Cheung et al. (Middleware’06, ICDCS’10) • Load balancing by relocating subscriber clients • Reduce delivery delay and broker processing load by relocating publisher clients

More Related