410 likes | 823 Views
Resource Allocation Algorithms for Event-Based Enterprise Systems. PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011. MIDDLEWARE SYSTEMS. RESEARCH GROUP. publisher. subscriber. subscriber.
E N D
Resource Allocation Algorithms for Event-Based Enterprise Systems PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011 MIDDLEWARE SYSTEMS RESEARCH GROUP
publisher subscriber subscriber brand = ‘Honda’ cashback = $6000 brand= ‘Honda’ cashback > $4000 Introduction to Distributed Content-based Publish/Subscribe broker brand = ‘Honda’ cashback > $2000 brand = ‘Honda’ cashback >= $0 multicast Advertisement path Subscription path Publication path
Desirable Properties of Distributed Content-based Publish/Subscribe • Decoupling of data sources and sinks • Ease of component addition and removal • Flexible routing based on message content • Efficient use of network resources • Distributed broker overlay network • Scalable • Fault tolerant
Applications of Publish/Subscribe • Network and systems monitoring [Mukherjee 1994] • Business activity monitoring [Fawcett et al. 1999] • Business process execution [Schuler et al. 2001] • Workflow management [Cugola et al. 2001] • Multiplayer online games [Bharambe et al. 2002] • RSS filtering [Petrovic et al. 2005; Rose et al. 2007] • Automated service composition [Hu et al. 2008] • Resource discovery [Yan et al. 2009]
Real Deployments of Distributed Publish/Subscribe • GooPS • Google’s pub/sub messaging middleware to integrate web applications (such as Gmail, Google Docs, Google Calendar) on a world-wide scale supporting millions of users • Hundreds of brokers with tens of thousands of pub/sub clients • Yahoo Message Broker • Yahoo’s pub/sub middleware to integrate applications with their database system, PNUTS • SuperMontage • Tibco’s pub/sub distribution network for Nasdaq’s quote and order-processing system • GDSN (Global Data Synchronization Network) • A global pub/sub network that allows retailers and suppliers (i.e., Walmart, Target, Metro, etc.) to exchange timely and accurate supply chain data
Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)
Problem • Brokers located at different geographical areas may suffer from uneven load distribution due to • Heterogeneous servers • Network congestion • Different densities and interests of end-users • Consequences • Overloaded brokers introduce high delivery delays that may ultimately crash from running out of memory • System that does not scale with the added resources
Visualizing the Problem S P S S S S
load-accepting broker offloading broker Overview of Load Balancing Approach S P S Local Load Balancing Global Load Balancing S S S
B12 B22 B32 B62 B52 B42 Evaluation P P B10 B20 B30 B60 B50 B40 P • Implemented on a real open source pub/sub system called PADRES • PlanetLab and a cluster testbed • Local and global load balancing • Homogeneous and heterogeneous servers • Compared against a naive approach S B11 B21 B31 B61 B41 B51 S S Global LB Setup
Summary • Load balancing enables the pub/sub system to scale with the number of resources • Load balancing solutions that are unaware of subscription load and relationships are ineffective • Long response time • Unstable system
Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)
Problem P • Publishers can join anywhere or to the closest broker in the overlay • Consequences • High delivery delay • Sluggish system • High resource usage in terms of matching, network bandwidth, and subscription storage • High IT costs S S
Approach P • Adaptively move publisher to area of matching subscribers • Two unique solutions • POP (Publisher Optimistic Placement) • Decision is based on the average number of downstream publication deliveries • GRAPE (Greedy Relocation Algorithm for Publishers of Events) • Decision is based on the end-to-end delivery delay, total broker message rate, and user specified inputs including the minimization metric (load/delivery delay) and weight S S
Reduced message rate by up to 85% Evaluation • Implemented on the open source pub/sub system called PADRES • PlanetLab and a cluster testbed • Enterprise and random workloads Reduced delivery delay by up to 68%
Summary • POP is suitable for pub/sub systems that strive for simplicity, such as GooPS • GRAPE is suitable for systems that strive to minimize in the extremes, such as system load in sensor networks or delivery delay in SuperMontage
Contributions • Load Balancing in Content-based Publish/Subscribe Systems (ACM TOCS’10) • Publisher Placement Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’10) • Green Resource Allocation Algorithms in Content-based Publish/Subscribe (IEEE ICDCS’11)
Problem • What is the deployment strategy for the broker overlay, publisher assignment, and subscriber assignment to minimize the broker message rate and number of allocated brokers? • Proven to be an NP-complete problem • Benefits • Increase capacity of the system • More efficient energy usage of the allocated servers • Fewer servers mean lower investment and maintenance costs • Inline with Green IT, which is also what enterprises such as Google and Yahoo are currently engaged in
Approach • 3 phase design . • Most compelling properties • Language independent • Content-based (XPath, regex, ranged, SQL, composite subscriptions, etc.) and topic-based, such as GooPS • Works effectively under any workload (defined or undefined)
Phase 1: Subscription Profiling Message ID Profile of each subscriber per advertisement maintained at the subscriber’s first broker B34-M213 B34-M215 Message ID of first index B34-M213 Start of bit vector B34-M216 Publications delivered to subscription 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 B34-M217 Fixed size so shift left if next publication is out of bit vector range B34-M220 Cardinality of bit vector corresponds to bandwidth requirement of the subscription B34-M222 Used to compute “closeness” of between any two subscriptions in the clustering algorithm. closeness = |si∩ sj| B34-M225 B34-M226
Phase 2: Subscription Allocation Algorithms • MANUAL/(AUTOMATIC) • Tree with fanout of 2, manual (random) placement of clients • Fastest Broker First (FBF) • Assign subscriptions randomly to the next most powerful broker • Bin Packing • Like FBF, but assigns the next highest traffic subscription • PAIRWISE-N, PAIRWISE-K (related approaches in ICDCS’02) • Subscription clustering where the number of clusters is given • CRAM (Clustering with Resource Awareness and Minimization) • Dynamically determines the number of clusters • Utilizes a new clustering algorithm that is more effective • Evaluated with 4 different subscription closeness metrics, with one derived from Banavar et al. in ICDCS '99
Bin Packing S S S S S S
Bin Packing’s Allocation Result S S S S S S
Phase 3: Broker Overlay Construction S S S S S S S S S
Bin Packing’s Final Overlay P P (( GRAPE )) (( GRAPE )) S S S S S S S S S
Evaluation • Implemented on the PADRES open source content-based pub/sub project • Evaluated on a cluster testbed with 80 brokers • Evaluated on SciNet, an HPC with 1000 brokers • Comparison against two related works (Riabov et al. ICDCS’02, Banavar et al. ICDCS’99) • Homogeneous and heterogeneous scenarios • Workload saturates the initial deployment (MANUAL)
Evaluation Results on SciNet Reduced message rate by up to 92% Reduced number of allocated brokers by up to 91%
Summary • CRAM combines the benefits of • Subscription clustering • Resource awareness from Bin Packing by simultaneously reducing both • Broker message rates • Number of allocated brokers • Bit vectors are powerful • Language independent (XPath, regex, topics) • Effective with any workload distribution
Conclusions • Load balancing increases • Availability by circumventing overloads • Scalability of the system • Publisher placement algorithms reduce • Broker input load by up to 68% • Broker message rate by up to 85% • Delivery delay by up to 68% • Resource allocation algorithms reduce • Average broker message rate by up to 92% • Number of allocated brokers by up to 91%
Future Work • Self-tuning of load balancing parameters • React dynamically by growing and shrinking the network in incremental steps • Improve runtime of the CRAM algorithm by parallelization or reducing its computational complexity • Model workload with more sophisticated methods, such as stochastic processes, to improve accuracy of load estimation • Address fault resiliency in each approach
Related Works - Clustering • Riabovet al. (ICDCS’02) • The number of clusters K is pre-specified • Each cluster is a multicast address, thus there is no upper limit on its size • Event space is divided into grids • Supports only ranged subscriptions • Their pairwise clustering considers each subscription individually • Gryphon (ICDCS'99) • Supports only equal and * subscriptions • Each cluster is stored in memory, the upper bound limit is not a major concern • SUB-2-SUB (IPTPS'06) • Supports only ranged subscriptions • Each cluster is a p2p network, thus there is no upper limit on the cluster size
Related Works – Broker Overlay Construction, Publisher and Subscriber Placement Algorithms • Baldoni et al. (The Computer Journal), • Jaeger et al. (SAC'07) • Migliavacca et al. (DEBS’07) • Reconfigure broker overlay to reduce delivery delay and broker processing load • Cheung et al. (Middleware’06, ICDCS’10) • Load balancing by relocating subscriber clients • Reduce delivery delay and broker processing load by relocating publisher clients