230 likes | 250 Views
Explore quality-of-service challenges in DSMS, proposing a feedback control approach for load shedding to maintain performance under varied loads. Discover steps involving system identification, controller design, and actuator implementation.
E N D
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu¥ †Department of Computer Sciences, Purdue University, USA ‡School of Computing Science, Simon Fraser University at Surrey, Canada ¥School of Mechanical Engineering, Purdue University, USA DEXA 2005
Data Stream Management • Continuous data, discarded after being processed • Continuous query • Data-active query-passive model • Applications • Financial analysis • Mobile services • Sensor networks • Network monitoring • More … DEXA 2005
DSMS architecture • Network of query operators (O1 – O3) • Each operator has its own queue (q1 – q4) • Scheduler decides which operator to execute • Query results (Q1, Q2) pushed to clients • Example systems: • Aurora/Borealis • STREAM DEXA 2005
Quality-of-Service (QoS) in DSM • Data processing is QoS-critical in DSMS • Tuple delay is the major concern: results generated from old data are useless! • Highly dynamic environment hard to maintain QoS • Bursty data input • Unpredictable unit processing cost • Overloading during spikes degraded (delay) QoS • Solution: adjust the following (i.e. quality adaptation) • Sampling rate (source side) • Data loss (DSMS side) load shedding DEXA 2005
Load Shedding • Eliminating excessive load by dropping data items less QoS violations • Basic algorithm (Tatbul et al., 2003): periodically • CPU is the bottlenecking resource • Key questions • When? • How much? • Where? • Which tuples? DEXA 2005
What’s missing? • Current solutions focus on steady-state performance • Assuming input level changes between stable states • However, arrivals are bursty in practice – always in transient state • Taking averages (baseline) wouldn’t work DEXA 2005
Our approach • View load shedding as a feedback control problem • Feedback Control: manipulation of system behavior by adjusting system input based on system output • Cruise control of automobiles, room temperature control, etc. • The feedback control loop: • Plant • Monitor • Controller • Actuator • How it works • Error = measured output – desirable output • Focal point: controller, which maps error to control signal DEXA 2005
Why Feedback Control ? • Maintain system performance under internal/external uncertainties • Control theory provides tools to choose and tune controller toward desired performance • Current load shedding solution is also feedback-based • Difference: we use control theory to guide the controller design • Steps of problem-solving using control theory • Mapping problem to feedback control loop, determine input/output • System identification: modeling input/output relationship • Controller design: can be done analytically DEXA 2005
The feedback control loop • Plant : current DSMS • Input : load admitted • Output : delay QoS • Reference output: specified by DBA • Actuator • adaptor: load shedder • admission controller • Monitor : new • Controller : new • System dynamics: disturbances • Discrete control: control period T DEXA 2005
System identification • To build dynamic model that describes the relationship between input and output • Most system can be modeled by the following linear difference equation: • I(x): input at period x • O(x): output at period x • n: order of the equation • ai, bi: system-specificcoefficients • Determine n, ai, biby experiments using synthetic inputs DEXA 2005
Controller design • PI controller: • E(k) : error • g, r: controller coefficients • Id(k) : desirable input • More efficiently: • Transfer function of the PI controller: • For example, a second order system has TF: • Closed-loop TF (CLTF): • determine g and r by pole placement of the CLTF (details skipped) DEXA 2005
Actuator (load shedder) design • Id(k) is the desirable load (# of data tuples) entering the DSMS during the next control period k • Let S(k) be the real load during period k, we need to discardS(k) - Id(k) tuples • Two implementations of load shedder: • Admit the first Id(k) tuples during period k • Pros: easy to implement, generate (100%) accurate control signal • Cons: skewed to the early arrivals • Sampling based shedding: each tuple is discarded with probability 1-p, i.e.p = Id(k) / S(k) • However,S(k) is unknown at the beginning of periodk • Solution: use S(k-1) to estimate S(k) and this does not affect controller performance (see backup slide) DEXA 2005
Determining control period • Control period T is critical in controller design • Two primary concerns in setting T • Should be short enough to capture the changes of input rate • Nyquist-Shannon theorem of sampling • The shorter the better • Output signal (delay) is measured as an average of all data tuples in one control period • T is too short small number of sampled tuples • T cannot be too short as the output signal may fail to represent real system status • We make tradeoffs between the above two factors and set T to one second DEXA 2005
Experiments • We evaluate our control-based solution by simulations • Set four classes of delays: 500ms – 2000ms • Operator scheduling policy: Earliest Deadline First • Input: CPU utilization • Output: deadline miss ratio • Small query network with 13 operators • Stream data: • Synthetic: Poisson, Pareto • Real: TCP traces • Comparison: static shedding • Amount of shedding follows a pre-determined STEPSIZE • Similar to TCP rate control DEXA 2005
Simulation results: Poisson inputs Target deadline miss ratio (control goal) is set to zero Inputs Outputs DEXA 2005
Simulation results: bursty inputs a. Pareto b. TCP trace • Much less deadline misses than static shedding • The same or lower level of data loss (load shed) • Hard to get an appropriate STEPSIZE in static shedding – not a problem in control-based approach DEXA 2005
Summary • Load shedding is an important quality adaptation method • Current solutions focusing on steady-state performance do not work well under bursty inputs • We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory • Initial experimental results by simulation show promising potential of our approach DEXA 2005
Verification of model First order linear model DEXA 2005
Simulation: unpredictable unit processing cost Control-based method learns the real cost DEXA 2005
Controller stability after replacing S(k) with S(k-1) Let Id’(k) be the input signal as a result of using S(k-1) instead of S(k), we have Id’(k) = pS(k-1) and thus S(k-1) Id (k) = S(k) Id’(k) . In the z-domain, we get Id (k) = zId’(k) . Plugging above into the CLTF, we have According to control theory, controller is still stable. DEXA 2005
Ongoing work • Performed all three steps in a real DSMS – the Borealis system • We set output to average delay • System identification gives afirst-order model structure • Control function • Controller analysis gives the following set of parameters: DEXA 2005
Ongoing work: results • Control target: 2000ms • Comparison: • Adaptive: static shedding • BASELINE • NON-CTRL • Metrics: • Total delay violations • Total delayed tuples • Max delay • Load shed DEXA 2005
Ongoing work: results DEXA 2005