150 likes | 365 Views
Load Shedding Techniques for Data Stream Systems. Brian Babcock Mayur Datar Rajeev Motwani Stanford University. Differences from Previous Talk. Our focus: Aggregation queries No quality of service specifications Instead, focus on accuracy of query answers
E N D
Load Shedding Techniques for Data Stream Systems Brian BabcockMayur DatarRajeev MotwaniStanford University
Differences from Previous Talk • Our focus: Aggregation queries • No quality of service specifications • Instead, focus on accuracy of query answers • Compensate for dropped data by scaling answers • Random drops only (no semantic drops)
Sliding Window Aggregate Queries(SUM and COUNT) Filters, UDFs, and Joins w/ Relations Operator Sharing Problem Setting Q1 Q2 Q3 Σ Σ Σ R S1 S2
Std Dev σMean μ Processing Time tSelectivity s Stream Rate r Inputs to the Problem Q1 Q2 Q3 Σ Σ Σ R S1 S2
Σ3 Scaleanswer by 1/p 2 1 Load = rt1 + p(rs1t2 + rs1s2t3) S Load Shedding via Random Drops (time, selectivity) (t3, s3) Load = rt1 + rs1t2 + rs1s2t3 (t2, s2) Sampling Rate p (t1, s1) Need Load ≤ 1 Stream Rate r
Problem Statement • Relative error is metric of choice: |Estimate - Actual| Actual • Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 • Want low error with high probability
Query-dependentconstant Relative errorfor query i Sampling ratefor query i Relating Load Shedding and Error • Equation derived from Hoeffding bounds • Constant Ci depends on: • Variance of aggregated attribute • Sliding window size
Calculate Ratio of Sampling Rates • Minimize maximum relative error → Equal relative error across queries • Express all sampling rates in terms of common variable λ
Placing Load Shedders Target .8λ Target.6λ Σ Σ Sampling Rate .75 = .6λ /.8λ Sampling Rate .8λ
Conclusion • Load shedding helps cope with bursts • Minimizing relative error is natural objective for aggregate queries • Algorithm for load shedding: • Relate target sampling rates for all queries • Place random drop operators based on target sampling rates • Adjust sampling rates to achieve desired load
Thanks for listening! • Questions?
RelativeError Sliding window size Sampling ratefor query Variance of aggregated attribute Choosing Target Sampling Rates
Tuple w/ value x: • x / (p1p2) • 0 with pr. p1p2with pr. 1-p1p2 Measuring Inaccuracy • Key point: Product of sampling rates determines quality of approximate answer Scale answer by 1/(p1p2) Σ3 Sampling Rate p2 2 Sampling Rate p1 1