450 likes | 570 Views
Efficient monitoring of Web resources. Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus. Profile-Based Online Data Delivery.
E N D
Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus
Profile-Based Online Data Delivery • Data delivery: the delivery of data of interest (specified in profiles) from servers (data providers) to clients (data consumers). • Push vs. Pull • Server capabilities vs. Client requirements • Profiles: specify what, when, how data should be delivered, and its delivery value. • Online: the decision making of what and when to deliver is usually done without a complete knowledge of the all “stream” of future requirements or capabilities in the system, while considering the sources’ dynamic behavior.
Example: Monitoring RSS Feeds • Other example applications: • E-Commerce & E-Markets • Grid • Mashups & Portals • Continuous Queries (CQ) • Cache Management • … pull push
Research Goals • Proposed a generic model for profile based online data delivery. • Allows to negotiate over the dynamic nature of resources and use time based constraints. • Considered both servercapabilities and user requirements. • allow the generation of a hybrid push-pull solution. • Handle various data delivery aspects: • Dual approach for targeted data delivery. • Hybrid push-pull framework and data delivery solution. • Capturing data delivery tradeoffs • Complex data delivery under bandwidth constraints
Related Work • Pull • Update Models • Web Crawling/Monitoring (WIC) • Sensor-nets • Grid • Web Services • Mashups • Web Caching (LR-Profiles, Prefetching) • PDCM • Push • Systems: BlackBerry, JMS, Google Alerts • Web Caching & Synchronization • Publish/Subscribe • Stream processing/CQ & CEP • Broadcast systems • CDNs (e.g., RSS aggregation) • Hybrid: Pop-Pap, Data Gerrymandering, Ajax, RSS
Data delivery model: Profiles • We propose a novel profile model based on execution intervals. • Execution Interval: an association of a time interval with some resource. • Complex execution intervals can be also specified. • Can be specified either explicitly or implicitly (using EI-patterns that are further derived using an update model). • Have some unique properties that effect scheduling. • Profiles: a set of execution intervals, and include: • Notification rules that associate utilityvalues to execution intervals. • Profile owner role (either client or a server).
Example Client Profile profile owner role notification rules complex-EI pattern local and global utilities
Schedules, Constraints, and Data Delivery Metrics • Schedule: A mapping • Constrained Schedules:limited budget for different data delivery tasks (e.g., “politeness” constraints or upper bound on parallel monitoring/listening tasks). • Data delivery metrics: • Completeness (max) • Data latency (min) • System resource utilization (Probes) (min) • Execution time (min) • Gained Utility (Satisfiability) (strict) • Data delivery objectives and performance evaluation are based on those metrics.
A Dual Approach for Targeted Data Delivery • Instead of maximizing utilityunder (strict) system resource constraint, minimize system resource utilizationwhile (strictly) satisfying (all) user profiles. • Main motivation: dynamic allocation according to user profiles may produce benefit for both objectives. • We propose an optimal static algorithm SUP for the dual problem. • Under some conditions, SUP is even optimal for both objectives! • We further present adaptive versions of SUP, fbSUP and fbSUP(λ), that handle non-static situations using feedback. • Overall, results show that the dual approach is capable to dominate the traditional approach and has good utility/budget performance in the non-static case.
ProMo: Hybrid Framework for Online Data Delivery • Idea: mediate between clients and servers while considering both client requirements and server capabilities. • Solution: use the same profile structure both for servers and clients, as a result: • Matching clients and servers becomes easy. • Easy to generate hybrid schedules. • We provide a taxonomy of server capabilities and data delivery patterns. • The algorithm supports various capabilitypatterns (e.g., pull-only, push-only, hybrid, push-filter, and conditional-pull)
Capturing Approximate Data Delivery Tradeoffs (“The Proxy Dilemma”) • Completenessmore alternatives for drivers • Delay More time to react • High completeness may result in delayed delivery less time to react. • Low delay may results in missing updates less alternatives to consider.
Bandwidth Constrained Complex Profile Satisfaction Example 1: Arbitrage Monitoring Example 2: Mashups
Future Work • Dual approach: • Consider more constrained settings (e.g., lower bound of gained utility). • Adaptive switch between OptMon1 and OptMon2 solutions. • More general probabilistic adaptive framework. • None-uniform probing costs. • ProMo hybrid push-pull • Consider a constrained setting with ProMo (e.g., minimization push-pull costs, politeness constraints). • Develop a more refined server commitment model and server selection and ranking techniques.
Future Work (cont.) • Tradeoffs: • Find offline approximation for the general case. • Find online policies with competitive guarantees for the general case. • Usage of Pareto sets as design tool for online policies. • Private profiles based tradeoffs. • Consider complex profiles. • Complex: • General cost-benefit model (e.g., consider utility gain vs. monitoring costs). • Use other complex profile semantics (e.g., OR, SUBSET). • Develop update models for complex monitoring.
Schedules (cont.) delay ri Tj
Execution Intervals – Properties and Effects on Scheduling • Inter-resource overlap • Directly affects the probing congestion • Intra-resource overlap • Allows more then a single EI to be captured by a single probe. • Rank • Effects the difficulty in satisfying a single client requirement or finding a suitable server capability (thus, some pull will be required). • Can cause a skew in resource access patterns. • Explicit vs. Implicit • Implicit may require to use update models to derive explicit EIs and therefore, introduce noise in to the model. • Utility • Effect the relative importance of capturing.
SUP – Dual Optimality max clique max clique
SUP vs. TTL & WIC Static case - FPN(1.0) Dynamic case - Poison • #probes(SUP) = 2,462 • max #probes(WIC) = >65,000 • max #probes(TTL) = >65,000 • #probes(SUP) = 3,904 • max #probes(WIC) = >20,000 • max #probes(TTL) = >7,000
fbSUP vs. fbSUP(λ) • Both adaptive versions improve on SUP (with moderate probe budget increase) • fbSUP(λ) improve even for X=1 • fbSUP(λ) is the dominant • Up to X<4 fbSUP(λ) requires slightly more budget then fbSUP • For X≥4 fbSUP(λ) completely dominates fbSUP.
Efficient Offline Optimal Solution (case with no intra-resource overlaps)
S Offline Optimal Algorithm Correctness • From the algorithm construction: • Pareto optimality: By induction, let Sj be the jth schedule that is added to S: S SP Sj S’
Efficient Online Policies (case with no intra-resource overlaps) • Look Ahead: • Look Back:
LA (EDF) Optimal completeness (no intra-resource overlap) case 1: ri’ ri’ case 2: ri ri (preserve) (gain) Tj Tj case 2: case 1: • preserve. • other resource ri’ was selected by LA (but not by S^) apply Lemma 42 (preserve or gain). • gain. • other resource ri’ was selected (but not by LA) apply Lemma 42 (preserve or gain).
LAB Dominates LA • The proof follows from the definition of LAB potential: and ordering operator: . The proof follows from Lemma 42 (completeness preservation) and Lemma 47 each local change from LA to LAB would result in less delay.
LB Tradeoff 4-Approximation (no intra-resource overlap) • Completeness 2-approximation: Basic idea: given any schedule S and LA, if we change S into LA, each change might improve the performance by at most 1 S has no more then 2 times less completeness then LA. • Latency 4-approximation: • Tks: k-th first time that OPT didn’t probe, but LB did (and Tkf be the last time)…and Tks’, Tkf’ when both did. • Best case: LB and OPT act the same black circles. • Worst case (triangles): OPT has: while LB has at most: • Thus we get approximation ratio = 4 Whenever the EIs have uniform width W LB is 2-approximation
LBW Tradeoff 2-Approximation (no intra-resource overlap) Completeness 2-approximation: Same as in LB. Delay trap Optimal Latency: LB’s greedy “delay traps”: Case 1: S S’ j S’ Case 2: S S’ j S
Online policies vs. Optimal Pareto Set:Runtime Scalability Analysis
Online policies: Workload Impact (no intra-resource overlap)
Online policies: Workload Impact (with intra-resource overlap)
Proposed Offline Approximation • As A we use Bar-Yehuda et al. algorithm for scheduling split-intervals. • C=1 A provides 2k-approx. we get (2k+2)-approx. • C>1 A provides (2k+1)-approx. we get (2k+3)-approx. • Drawbacks: the transformation may be quite expensive. A doesn’t scale (requires LP solution for fractional version of the problem).
“Bad guys” “Good guys” • Pick good guys gain 3 + 2 (length(I)) • Pick bad guys gain 1 • Ratio = length(I)) at the worst case every CEI has equal length comp. ratio: MRSF: l-Compatitive (case with no intra-resource overlap)
Online Policies vs. Offline approx. • For rank(P)=1 both WIC and EDF are optimal. • For any rank(P) the worst case optimal upper bound is OPTrank(1)/rank(k). • Simple policies (i.e., WIC,EDF) do not fit into problems with complex profiles • here COMPMRSF ≥ COMPoff ≥ OPT/2k • offline policy doesn’t scale • online policies scale quite well