Efficient monitoring of Web resources

Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus

Profile-Based Online Data Delivery • Data delivery: the delivery of data of interest (specified in profiles) from servers (data providers) to clients (data consumers). • Push vs. Pull • Server capabilities vs. Client requirements • Profiles: specify what, when, how data should be delivered, and its delivery value. • Online: the decision making of what and when to deliver is usually done without a complete knowledge of the all “stream” of future requirements or capabilities in the system, while considering the sources’ dynamic behavior.

Example: Monitoring RSS Feeds • Other example applications: • E-Commerce & E-Markets • Grid • Mashups & Portals • Continuous Queries (CQ) • Cache Management • … pull push

Research Goals • Proposed a generic model for profile based online data delivery. • Allows to negotiate over the dynamic nature of resources and use time based constraints. • Considered both servercapabilities and user requirements. • allow the generation of a hybrid push-pull solution. • Handle various data delivery aspects: • Dual approach for targeted data delivery. • Hybrid push-pull framework and data delivery solution. • Capturing data delivery tradeoffs • Complex data delivery under bandwidth constraints

Related Work • Pull • Update Models • Web Crawling/Monitoring (WIC) • Sensor-nets • Grid • Web Services • Mashups • Web Caching (LR-Profiles, Prefetching) • PDCM • Push • Systems: BlackBerry, JMS, Google Alerts • Web Caching & Synchronization • Publish/Subscribe • Stream processing/CQ & CEP • Broadcast systems • CDNs (e.g., RSS aggregation) • Hybrid: Pop-Pap, Data Gerrymandering, Ajax, RSS

Data delivery model: Data and Architecture

ProMo Proxy - Overview

Data delivery model: Profiles • We propose a novel profile model based on execution intervals. • Execution Interval: an association of a time interval with some resource. • Complex execution intervals can be also specified. • Can be specified either explicitly or implicitly (using EI-patterns that are further derived using an update model). • Have some unique properties that effect scheduling. • Profiles: a set of execution intervals, and include: • Notification rules that associate utilityvalues to execution intervals. • Profile owner role (either client or a server).

Execution intervals - Example

Example Client Profile profile owner role notification rules complex-EI pattern local and global utilities

Schedules, Constraints, and Data Delivery Metrics • Schedule: A mapping • Constrained Schedules:limited budget for different data delivery tasks (e.g., “politeness” constraints or upper bound on parallel monitoring/listening tasks). • Data delivery metrics: • Completeness (max) • Data latency (min) • System resource utilization (Probes) (min) • Execution time (min) • Gained Utility (Satisfiability) (strict) • Data delivery objectives and performance evaluation are based on those metrics.

A Dual Approach for Targeted Data Delivery • Instead of maximizing utilityunder (strict) system resource constraint, minimize system resource utilizationwhile (strictly) satisfying (all) user profiles. • Main motivation: dynamic allocation according to user profiles may produce benefit for both objectives. • We propose an optimal static algorithm SUP for the dual problem. • Under some conditions, SUP is even optimal for both objectives! • We further present adaptive versions of SUP, fbSUP and fbSUP(λ), that handle non-static situations using feedback. • Overall, results show that the dual approach is capable to dominate the traditional approach and has good utility/budget performance in the non-static case.

ProMo: Hybrid Framework for Online Data Delivery • Idea: mediate between clients and servers while considering both client requirements and server capabilities. • Solution: use the same profile structure both for servers and clients, as a result: • Matching clients and servers becomes easy. • Easy to generate hybrid schedules. • We provide a taxonomy of server capabilities and data delivery patterns. • The algorithm supports various capabilitypatterns (e.g., pull-only, push-only, hybrid, push-filter, and conditional-pull)

Capturing Approximate Data Delivery Tradeoffs (“The Proxy Dilemma”) • Completenessmore alternatives for drivers  • Delay  More time to react  • High completeness may result in delayed delivery  less time to react.  • Low delay may results in missing updates  less alternatives to consider. 

Bandwidth Constrained Complex Profile Satisfaction Example 1: Arbitrage Monitoring Example 2: Mashups

Future Work • Dual approach: • Consider more constrained settings (e.g., lower bound of gained utility). • Adaptive switch between OptMon1 and OptMon2 solutions. • More general probabilistic adaptive framework. • None-uniform probing costs. • ProMo hybrid push-pull • Consider a constrained setting with ProMo (e.g., minimization push-pull costs, politeness constraints). • Develop a more refined server commitment model and server selection and ranking techniques.

Future Work (cont.) • Tradeoffs: • Find offline approximation for the general case. • Find online policies with competitive guarantees for the general case. • Usage of Pareto sets as design tool for online policies. • Private profiles based tradeoffs. • Consider complex profiles. • Complex: • General cost-benefit model (e.g., consider utility gain vs. monitoring costs). • Use other complex profile semantics (e.g., OR, SUBSET). • Develop update models for complex monitoring.

Backup Slides

Schedules (cont.) delay ri Tj

Model: Feasible Schedules

Execution Intervals – Properties and Effects on Scheduling • Inter-resource overlap • Directly affects the probing congestion • Intra-resource overlap • Allows more then a single EI to be captured by a single probe. • Rank • Effects the difficulty in satisfying a single client requirement or finding a suitable server capability (thus, some pull will be required). • Can cause a skew in resource access patterns. • Explicit vs. Implicit • Implicit may require to use update models to derive explicit EIs and therefore, introduce noise in to the model. • Utility • Effect the relative importance of capturing.

SUP – Dual Optimality max clique max clique

fbSUP (SUP with feedback)

fbSUP(λ)

SUP vs. TTL & WIC Static case - FPN(1.0) Dynamic case - Poison • #probes(SUP) = 2,462 • max #probes(WIC) = >65,000 • max #probes(TTL) = >65,000 • #probes(SUP) = 3,904 • max #probes(WIC) = >20,000 • max #probes(TTL) = >7,000

fbSUP vs. fbSUP(λ) • Both adaptive versions improve on SUP (with moderate probe budget increase) • fbSUP(λ) improve even for X=1 • fbSUP(λ) is the dominant • Up to X<4 fbSUP(λ) requires slightly more budget then fbSUP • For X≥4 fbSUP(λ) completely dominates fbSUP.

ProMo – Server Capabilities vs. Data Delivery Patterns

ProMo Middleware - Example

Pareto Sets and Approximation

Efficient Offline Optimal Solution (case with no intra-resource overlaps)

S Offline Optimal Algorithm Correctness • From the algorithm construction: • Pareto optimality: By induction, let Sj be the jth schedule that is added to S: S SP Sj S’

Efficient Online Policies (case with no intra-resource overlaps) • Look Ahead: • Look Back:

LA (EDF) Optimal completeness (no intra-resource overlap) case 1: ri’ ri’ case 2: ri ri (preserve) (gain) Tj Tj case 2: case 1: •  preserve. •  other resource ri’ was selected by LA (but not by S^)  apply Lemma 42 (preserve or gain). •  gain. •  other resource ri’ was selected (but not by LA)  apply Lemma 42 (preserve or gain).

LAB Dominates LA • The proof follows from the definition of LAB potential: and ordering operator: .  The proof follows from Lemma 42 (completeness preservation) and Lemma 47  each local change from LA to LAB would result in less delay.

LB Tradeoff 4-Approximation (no intra-resource overlap) • Completeness 2-approximation: Basic idea: given any schedule S and LA, if we change S into LA, each change might improve the performance by at most 1  S has no more then 2 times less completeness then LA. • Latency 4-approximation: • Tks: k-th first time that OPT didn’t probe, but LB did (and Tkf be the last time)…and Tks’, Tkf’ when both did. • Best case: LB and OPT act the same  black circles. • Worst case (triangles): OPT has: while LB has at most: • Thus we get approximation ratio = 4 Whenever the EIs have uniform width W  LB is 2-approximation

LBW Tradeoff 2-Approximation (no intra-resource overlap) Completeness 2-approximation: Same as in LB. Delay trap Optimal Latency: LB’s greedy “delay traps”: Case 1: S S’ j S’ Case 2: S S’ j S

Online policies vs. Optimal Pareto Set

Online policies vs. Optimal Pareto Set:Runtime Scalability Analysis

Online policies: Budget Impact

Online policies: Workload Impact (no intra-resource overlap)

Online policies: Workload Impact (with intra-resource overlap)

Proposed Offline Approximation • As A we use Bar-Yehuda et al. algorithm for scheduling split-intervals. • C=1  A provides 2k-approx.  we get (2k+2)-approx. • C>1  A provides (2k+1)-approx.  we get (2k+3)-approx. • Drawbacks: the transformation may be quite expensive. A doesn’t scale (requires LP solution for fractional version of the problem).

Greedy Online Policies

“Bad guys” “Good guys” • Pick good guys gain 3 + 2 (length(I)) • Pick bad guys  gain 1 • Ratio = length(I))  at the worst case every CEI has equal length  comp. ratio: MRSF: l-Compatitive (case with no intra-resource overlap)

Online Policies vs. Offline approx. • For rank(P)=1 both WIC and EDF are optimal. • For any rank(P) the worst case optimal upper bound is OPTrank(1)/rank(k). • Simple policies (i.e., WIC,EDF) do not fit into problems with complex profiles • here COMPMRSF ≥ COMPoff ≥ OPT/2k • offline policy doesn’t scale • online policies scale quite well

Efficient monitoring of Web resources

Efficient monitoring of Web resources

Presentation Transcript

Web Resources

Web Resources

Web Resources

Web Resources

Web Resources

EFFICIENT USE OF RESOURCES

Web Resources

Web Resources

Web Resources

Web Resources

Web Resources

WEB RESOURCES

WEB MONITORING

Web Resources

Web Resources

Web Resources

Efficient Monitoring of QoS Parameters (EMQP)

Web resources

Web Resources

Web Resources

Web resources

Preservation of Web Resources