590 likes | 600 Views
This extended abstract explores the use of event-driven business process management (BPM) with service level agreements (SLAs) in cloud architectures. It discusses the benefits of using content-based publish/subscribe paradigm for managing events in BPM and highlights the need for de-coupling, fine-grained event filtering, in-network event processing, and event correlation. The PADRES approach is introduced as a solution for event-driven BPM with SLAs.
E N D
“BPM in Cloud Architectures: Business Process Management with SLAs and Events”, joint work with VinodMuthusamy (extended abstract) Enabling BPM for Clouds via Events & SLAs Summer School on Service Research, July 19th, 2010 Hans-Arno Jacobsen Bell University Laboratory Chair University of Toronto http://www.padres.msrg.utoronto.ca An eQoSystem for declarative distributed applications with SLAs • http://eqosystem.msrg.org/
Business Process Example Pid=c1 Loan Application Processing service RTT < 100ms uptime > 99.99 % Store inDB … service time < 2s Reject < 0.3 gid=c001 Creditcheck 2 < 0.5 Pid=c001 Pid=c3301 Pid=c401 gid=c001 c001 end Checkscore Checkscore 2 Creditcheck Approve service cost < $0.02 > 0.7 Send toofficer else … else Service Summer, July 19th, 2010, KIT, Germany
Large-scale Business Processes Vendor Goods selection Goods delivery Dispatch B Packaging Pick-up goods • Case Study (Chinese Electronics Manufacturer): • Department-level processes with 26 to 47 activities • Global processes that compose departmental ones • Thousands of concurrent instances • Hundreds of collaborating partners • Geographically distributed • Administrative boundaries Out-stock B FedEx Delivery Pick up Sale prediction Sign Contract Sale Fill order Determinate plan Process Check order CCC administrate Fill out-stock bill Check stock Manufactory Confirm features Design Fill dispatch bill Determinate plan Control Prototype Out Take Raw materials Execute plan Warehouse Material Out-stock B Pay Credit card Check Assign Audit Process control Make plan Target price Signature Raw Check dealer Check credit Finance Confirm Approval Approval Monitoring Feature selection Print receipt Validate Statistic Monitor Marketing Requirement collection Feedback Affirm order Chart Strategy Design Marketing Manufactory Order Payment Service Summer, July 19th, 2010, KIT, Germany
What Support is Required ? • De-coupling and loose coupling • Fine-grained event filtering • In-network event processing • Composite event detection • Event correlation Service Summer, July 19th, 2010, KIT, Germany
Agenda • Enabler • The PADRES approach • Event-driven BPM with PADRES • SLA-driven BPM with PADRES Service Summer, July 19th, 2010, KIT, Germany
What Abstractions Enable BPM? Service Summer, July 19th, 2010, KIT, Germany
What Abstractions Do Not Work? Cum grani salis • Databases • Great for managing historic data • But what about future data (e.g., events) • Data streams • Great for managing structured streams of tuples • But what about un-structured, multi-typed, sporadic, un-ordered events from many sources • Rule-based expert systems • Great for inference and reasoning • But what about managing large numbers of fined-grainedfilters in distributed environments Service Summer, July 19th, 2010, KIT, Germany
What Abstractions Enable BPM? • It is our opinion that the afore-mentioned requirements can best be addressed by • The content-based publish/subscribe paradigm • Realized by content-based message routing • Events represent state transitions in the environment. • Conveyed as publications to the pub/sub system • Event filtering and correlation is based on • Subscriptions managed by the pub/sub system Service Summer, July 19th, 2010, KIT, Germany
B B Matching Engine & B Routing Table Output queue d2 Input queue Notification Notification subscription dest Class = Loan d2 B Service RTT < 2s d3 B Output queue d3 Content-based Routing Content-based Publish/Subscribe 3. Publish Publisher Publisher Publications Service = credit check 1. Advertise class = Loan, status = approved, Event-Based Content Routing Flexible Service RTT < 2s amount > $500K 9 Decoupled Declarative Responsive Uptime > 99.99% Subscriptions Service RTT < 150ms 2. Subscribe Subscriber Subscriber Service Summer, July 19th, 2010, KIT, Germany
Application Modeling • Advertisements (schema or types) • [class=Loan], [action=*], [customerID=*], [amount<$100K], [region=East] • Publications & events (data) • [class, Loan], [action, request], [customerID, 876594], [amount, $50K] • Subscriptions (query) • [class=Loan], [amount>$500K], [region=*] Application semantics is expressed via advertisements (data sources), publications (data sources), and subscriptions (data sinks) S A P Service Summer, July 19th, 2010, KIT, Germany
Benefits of Publish/Subscribe • Simplifies IT development and maintenance by decoupling enterprise components • Supports sophisticated interactions among components using expressive subscription languages • Supports fine-grained subscriptions for event management • Achieves scalability with in-network filtering and processing Service Summer, July 19th, 2010, KIT, Germany
A E B C F D Many Applications are Event-based &Benefit from Publish/Subscribe Workflows, business processes and job scheduling Supply chain and logistics Job A done In flight Triggered Delivered Fault! Ordered Event-Based Callback RFID: Razor SKU Invoke Loan Light Temperature Transform Service oriented architectures RFID and sensor networks Service Summer, July 19th, 2010, KIT, Germany
Our PADRES ESB for Event-driven BPM Enterprise Services Bus (Events & Services Bus) • First generation of students, when I looked away • Peng Alex David aRno Eli Serge • PADRES is Publish/subscribe Applied to Distributed Resource Scheduling • PAdres is Distributed REsource Scheduling • http://www.padres.msrg.utoronto.ca Web start & open source padres.msrg.org Implemented in Java Acknowledgements (2004-present): 13 Service Summer, July 19th, 2010, KIT, Germany
Our PADRES ESB Stack & Vision Service Summer, July 19th, 2010, KIT, Germany
PADRES ESB is an Overlay of P/S Brokers dest2 Matching Engine B + Publications Routing Table B B dest1 output queue dest2 output queue dest1 input queue subscription dest B dest3 output queue dest3 S P service time < 3s dest2 service time < 2s dest3 service time = 3s service time = 2.5s service time = 1s S P = publisher = subscriber Try out and download at: http://www.padres.msrg.org S Service Summer, July 19th, 2010, KIT, Germany
A E B C F D Innovative PADRES Features HistoricAccess CSRG TR 2009 ACM DEBS’2007 Management CompositeEvents ACM Middleware’2007 ACM Middleware’2004 IEEE ICDCS’2005 IEEE ICDCS’2009 ACM Middleware’2008 ACM DEBS’2007 Security Robustness IEEE ICDCS’2010 ACM Middleware’2006 LoadBalancing Service Summer, July 19th, 2010, KIT, Germany
Event-driven BPM with PADRES Service Summer, July 19th, 2010, KIT, Germany
A B C D Exception A E E B C F F D B C D Modeling Business Processes • Dependency in processes and more complex process patterns require event correlation • Event correlation is enabled by the detection of composite events • Composite events are expressed via composite subscriptions • Composite subscription consists of atomic subscriptions • Subscription language features for BPM modeling • E.g., AND, OR, and variables ($x) • Example: D executes, if BandC have completed (D depends on B and C). Service Summer, July 19th, 2010, KIT, Germany
Composite Subscription Examples • Expresses a structural property of a process [class = Activity Status], [cmd. = Archived],[Process ID = $X] AND [class = Activity Status], [cmd. = Signed Off],[Process ID = $X] • Expresses a performance property of a process [cmd. = Credit check request], [Process ID = $X] AND [status = Approved], [Process ID = $X] … Loan Process Archive Process Approve > $50K Checkscore Creditcheck Singed Off Reject Service Summer, July 19th, 2010, KIT, Germany
E F Business Process Management • Transformation of process into pub/sub language • Deployment of transformed process • Execution of process by triggering instances • Monitor process & instance execution • Manage, i.e., control, version,… trigger multiple instances concurrently trigger Exception & compensation A B C D Service Summer, July 19th, 2010, KIT, Germany
BPEL Receive Assign Flow Invoke Wait Reply Business Process Execution END WS Gateway Agent WS client PADRES ESB 6 Invoke 1 3 4 Web Service 2 5 Receive Assign Web Service Reply Wait pub/sub http/soap Service Summer, July 19th, 2010, KIT, Germany
BPEL Transformation Example Sub1 (flow agent): [class = ACTIVITY_STATUS], [process = ’Process5’], [activityName = ’activity1’], [IID isPresent any], [status = ’SUCCESS’] Sub4 (actvity6 agent): ( [class = ACTIVITY_STATUS], [process = ’Process5’], [activityName = ’activity5’], [IID = $X], [status = ’SUCCESS’] ) AND ( [class = LINK_STATUS], [process = ’Process5’], [activityName = ’activity2’], [IID = $X], [status isPresent any] ) Sub1: [class,eq,ACTIVITY_STATUS], [process,eq,’Process5’], [activityName,eq,’activity1’], [IID,isPresent,any], [status,eq,’SUCCESS’] Pub3: [class,LINK_STATUS], [process,’Process5’], [activityName,’actitiy2’], [IID,’g001’], [status,’POSITIVE’] Sub5: [class,eq,ACTIVITY_STATUS], [process,eq,’Process5’], [activityName,eq,’activity4’], [IID,isPresent,any], [status,eq,’SUCCESS’] && [class,eq,ACTIVITY_STATUS], [process,eq,’Process5’], [activityName,eq,’activity7’], [IID,isPresent,any], [status,eq,”SUCCESS”] Process 5 activity1 Pub1: [class, ACTIVITY_STATUS], [process,’Process5’], [activityName,’flow1’], [IID,’g001’], [status,’STARTED’] Sub4: [class,eq,ACTIVITY_STATUS], [process,eq,’Process5’], [activityName,eq,’activity2’], [IID,eq,$X], [status,eq,’SUCCESS’] && [class,eq,LINK_STATUS], [process,eq,’Process5’], [activityName,eq,’activity2’], [IID,eq,$X], [status,isPresent,any] flow1 activity2 activity5 activity3 activity6 Sub2: [class,eq,ACTIVITY_STATUS], [process,eq,’Process5’], [activityName,eq,’flow1’], [IID,isPresent,any], [status,eq,’STARTED’] Pub5: [class, ACTIVITY_STATUS], [process,’Process5’], [activityName,’actitiy7’], [IID,’g001’], [status,’SUCCESS’] activity4 activity7 activity8 Pub2: [class, ACTIVITY_STATUS], [process,’Process5’], [activityName,’actitiy2’], [IID,’g001’], [status,’SUCCESS’] Pub4: [class, ACTIVITY_STATUS], [process,’Process5’], [activityName,’actitiy6’], [IID,’g001’], [status,’SUCCESS’] Cf. our ACM Trans Web’2010 for full BPEL mapping Service Summer, July 19th, 2010, KIT, Germany
Evaluation: Changing Request Rate P/S Clustered 20 servers P/S Centralized P/S P/S P/S Distributed PADRES ESB Service Summer, July 19th, 2010, KIT, Germany
SLA-driven BPM An eQoSystem for declarative distributed applications with SLAs • http://eqosystem.msrg.org/ Service Summer, July 19th, 2010, KIT, Germany
Currently, business goals must be manually considered at every stage of the business process development cycle Only trusted partners service time < 3s Find flight Y Far? Validaterequest Getdestination Find train N cost < $0.02 Service Summer, July 19th, 2010, KIT, Germany
Service Level Agreements (SLAs) SLAs are contracts between service consumers and providers that specify the expected behavior of each party and the penalties of violating the contract. SLAs specify business goals declaratively. Service Summer, July 19th, 2010, KIT, Germany
p q A B C D Runtime Uses of SLAs Process Dynamic service discoveryDiscover services with capabilities that satisfy goals. MonitoringOnly monitor the business events related to goals.Feed back measurements to support runtime adaptations. Distributed executionFine-grained allocation of process to available resources.Move portions of process to strategic locations. ESB adaptationReconfigure the ESB topology to satisfy goals. ESB broker topology C A,B service time < 2s 1a M service time < 1s 1b D 2 Web service Execution engine ESB node (PADRES broker) M Monitor Service Summer, July 19th, 2010, KIT, Germany
Agenda • Distributed process execution • Architecture • Components • Execution & optimization algorithms • Design issues • Evaluation Service Summer, July 19th, 2010, KIT, Germany
Process Execution Architectures Centralized One execution engine May not scale Central point of failure Clustered • Replicated execution engines • Centralized control and data • High bandwidth and latency • Still may not scale & administrative limitations Problem: How to deploy activities in a distributed manner to satisfy SLAs? AB C D AB C D AB C D C A,B D Agent-based • Distributed execution engine • In-network processing • Lower bandwidth and latency • Fine-grained use of resources Service Summer, July 19th, 2010, KIT, Germany
Large-scale Business Processes Vendor Goods selection Goods delivery Dispatch B Packaging Pick-up goods Out-stock B FedEx Delivery Pick up Sale prediction Sign Contract Sale Fill order Determinate plan Process Check order CCC administrate Fill out-stock bill Check stock Manufacture Confirm features Design Fill dispatch bill Determinate plan Control Prototype Out Take Raw materials Execute plan Warehouse Material Out-stock B Pay Credit card Check Assign Audit Process control Make plan Target price Signature Raw Checkdealer Checkcredit Finance Confirm Approval Approval Monitoring Feature selection Print receipt Validate Statistic Monitor Marketing Requirement collection Feedback Affirm order Chart Strategy Design Marketing Manufacture Order Payment Service Summer, July 19th, 2010, KIT, Germany
Distributed Process Execution: Architecture & Components 31 Service Summer, July 19th, 2010, KIT, Germany
RedeploymentManager (CASCON) SLAs, cost models Estimators Ranking algorithms SLA Management Stack Execution Engine Candidate Engine Discovery (ACM DEBS’2009) Activity Profiler Engine Profiler Activity Manager Latency Bandwidth Engine resource Activities (ACM TWEB’2010) Instance states Atomic Redeployer (IEEE ICDCS’2009) Input, output queues PADRES messaging layer Service Summer, July 19th, 2010, KIT, Germany
Activity Profiler Example: • Profiles execution of local activities • Maintains profiles for various metric types • Message hops, disk I/O, energy usage, etc. Process Activity P T S ESB broker topology T P 2 5 8 1 4 7 S 3 6 9 Broker Execution engine Service Summer, July 19th, 2010, KIT, Germany
activity lifecycle events Activity Profiler Summary ai APk activity profiler for metric type Mk sk(ai) Service Summer, July 19th, 2010, KIT, Germany
Engine Profiler (e.g., distance) • Computes and caches information about candidate engines • Cf. DEBS’2009 for our resource discovery algorithms to identify candidates • Discover paths • e2→e5 , e5→e7 • Probe paths • e5→e4 (for candidate C) • Compute paths • e2→e4 , e4→e7 • Cases • e4 in e2→e5 • e4 in e5→e7 • otherwise Process P T S T P 2 5 8 1 4 7 S C 3 6 9 Broker Execution engine Service Summer, July 19th, 2010, KIT, Germany
set of candidate engines Engine Profiler Summary ej EPk engine profiler for metric type Mk pk(ej) Service Summer, July 19th, 2010, KIT, Germany
Redeployment Manager • Estimator: Computes an estimate of the metric cost ck(ai,ej) of hosting an activity ai at engine ej • Cost model: Computes an estimate of the cost c(ai,ej) of hosting activity ai on engine ej • Check deployment: Determines what to do with an activity ai • Determine best engine e • Compute benefit: c(ai) – c(ai,e) • Compute resident time at current engine • If resident long enough • If benefit is large enough move ai to e • Otherwise, apply pressure to other activities Service Summer, July 19th, 2010, KIT, Germany
Redeployment Manager Summary • Compute the cost of deploying local activities ai at candidate engines ej Measurements C(ai): Cost at local engine E(P(ai)): Location of predecessors E(S(ai)): Location of successors Cost Model Given F(ai): Cost function P(ai): Predecessors S(ai): Successors Complexity Memory: O( |ai| |ej| ) Computation: O( |ai| | ej| |Navg(ai)| ) Compute C(ai, ej) for every ai, ej: Estimated cost of deploying activity ai at candidate engine ej Service Summer, July 19th, 2010, KIT, Germany
Redeployment Manager Summary ai ej Ek estimator for metric type Mk ck(ai,ej) APk EPk ai ej f CM cost model c(ai,ej) Ek Ek redeploy redeploy ai {ai} RM Redeployment Manager Check Deployment ej { (ai,ej) } pressure pressure {ej} { (ai,{ej})} Service Summer, July 19th, 2010, KIT, Germany
Atomic Redeployment • Traditional pub/sub client movement protocols are expensive and do not offer transactional properties • Transactional movement • Formalized movement properties similar to ACID properties • Efficient and guaranteed routing reconfiguration • For example, guarantee that no messages are lost, if an activity is re-deployed • See IEEE ICDCS’2009 Service Summer, July 19th, 2010, KIT, Germany
RedeploymentManager (CASCON) SLAs, cost models Estimators Ranking algorithms SLA Management Stack Summary Execution Engine Candidate Engine Discovery (ACM DEBS’2009) Activity Profiler Engine Profiler Activity Manager Latency Bandwidth Engine resource Activities (ACM TWEB’2010) Instance states Atomic Redeployer (IEEE ICDCS’2009) Input, output queues PADRES messaging layer Service Summer, July 19th, 2010, KIT, Germany
Cost Model Components The cost of a process is based on metrics 1. Distribution cost Cdist = f(Cd3, Cd1, Cd2) Cd1 Message rate Cd2 Message size Cd3Latency 2. Engine cost Ceng = f(Ce1, Ce2,Ce3) Ce1 Load (number of instances) Ce2 Resources (CPU, memory, etc.) Ce3 Activity complexity 3. Service cost Cserv = f(Cs1, Cs2, Cs3) Cs1 Latency of external service Cs2 Execution time of external service Cs3 Marshalling/unmarshalling Cost(activity) = f(wiCi) Cost(process) = ∑cost(activity) These metrics can be weighted to achieve different objectives Optimize time wd1 = wd3 = we3 = wserv = 1, other wi = 0 Optimize network overhead wd2 = wd3 = 1, other wi = 0 Various optimization criteria can be specified Threshold criteria: ∑wiCi > x E.g., Report SLA violations within 3 s. Minimized criteria:min( ∑wiCi ) E.g., Minimize distribution overhead Service Summer, July 19th, 2010, KIT, Germany
Examples of SLAs & Cost Functions • Minimize message hops • f() = msg_hops_rate = msg_rate * engine_distance • Minimize bandwidth cost • f() = msg_rate * link_cost $$$ • Limit CPU & network energy usage • f() = 0.3 * cpu_energy + 0.7 * link_energy < X f() = 0.3 * (invocation rate * engine_unit_energy) + 0.7 * (msg_rate * link_unit_energy) < X Service Summer, July 19th, 2010, KIT, Germany
Evaluation and design issues Service Summer, July 19th, 2010, KIT, Germany
Process Hotspot – Illustration Process p = 90% q = 10 % B G A D E F I C H Red activities are pinned to brokers ESB broker topology SLAMinimize traffic AB 2 5 8 GI D 1 4 7 F MetricMessage hops E C 3 6 9 H Broker Execution engine Service Summer, July 19th, 2010, KIT, Germany
Process Hotspot – Results 10% of static 47 • Traffic with redeployment is 47% of the static case • Post-redeployment traffic is 10% • Redeployment triggered in about 30 sec Service Summer, July 19th, 2010, KIT, Germany
Varying Hotspot – Illustration Process p q B G A D E F I C H ESB broker topology AB 2 5 8 GI Red activities are pinned to brokers D 1 4 7 F E C 3 6 9 H Broker Execution engine Service Summer, July 19th, 2010, KIT, Germany
Varying Hotspot – Results Dynamic redeployment suffers temporarily after process hotspot moves Traffic with redeployment is 42% of the static case Service Summer, July 19th, 2010, KIT, Germany
Summary on SLA-driven BPM • Distributed execution engine has qualitative and quantitative advantages • Redeployment algorithm can optimize SLA in many cases • Challenge 1: Local optima • Techniques to selectively expand knowledge can work • Widen candidate radius • Redeploy sets of activities • Challenge 2: Coordination • Independent decisions can destabilize system • Potential problem has not manifest in evaluations so far • Challenge 3: SLA granularity (more an engineering issue) • Can’t specify SLA on portions of a process • Can’t specify SLA on particular instances of a process (e.g., VIP user) Service Summer, July 19th, 2010, KIT, Germany