310 likes | 320 Views
Explore the use of adaptive stream filters for entity-based queries with non-value tolerance in data stream management systems, focusing on trading accuracy for query timeliness and user-defined tolerance.
E N D
Adaptive Stream Filters for Entity-based Queries with Non-value ToleranceVLDB 2005
Data Streams and Applications • Data Stream Management Systems (DSMS) • Sensor networks, location-based applications • STREAM [ABB03], STEAM [HAFME03], AURORA [ACC03], CACQ [MSH02] • Stream applications • Telecom call records • Network security [BO03] • Habitat monitoring [MPS02] • Structural health monitoring Continuous Queries Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Massive, Fast stream stream Continuous Query Query Processing Unit stream Result (Refreshed if needed) Central Processor Network stream Real-time, Response Time requirement DSMS Model Limited memory, CPU, network bandwidth User Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Trading Accuracy for Query Timeliness • A user may accept an answer with a carefully controlled error tolerance • wide-area resource accounting • load-balancing in replicated servers • The system exploits error tolerance to reduce communication and computation costs Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Value-based Tolerance • Often assumed in literature [OJW03, JCW04] • Maximum error is a numerical value specified by user • MAX Query: Return sensor id with the highest temperature • Guarantee the sensor id returned has temperature value not lower than from that of the true answer Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Is Selecting Easy? • Location-based application: a user inquires about his closest neighbor • Should the tolerance be 0.1, 1, or 100 meters? • Sensor network collects humidity, temperature, UV-index, wind speed • Does user know the range of error for each type? • Multi-dimensional data streams (e.g., location) • Multimedia data streams (e.g., CCTV images) Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
small large If is too small…… If is too large…… Is Selecting for MAX Query easy? Suppose a user accepts an object that ranks 2nd or above. Tolerance wasted ideal Error unacceptable The ideal …… Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Rank-based Tolerance • Express error tolerance as a rank • Error tolerance = no. of positions the returned sensor could rank below the highest one • More intuitive and easier to specify Rank-based tolerance = 1 Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Non-Value Tolerance • Rank-based tolerance is non-value- tolerance • numerical value not used • Fraction-based Tolerance • False Positive F+(t): % of returned answers that are incorrect at time t • False Negative F-(t): % of correct answers not returned at time t • F+(t) ≤ +; F-(t) ≤ - Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Entity-based Queries • Return sets of object ids, not numerical values [CKP03] • Rank-based queries: order of stream values decides the final answer • e.g., top-k query, k-nearest-neighbor query • Non-rank-based queries: order of stream values is not important • e.g., range query • Non-value tolerance matches entity-based queries! Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Continuous Query Classification Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Approximate Answer User-defined Tolerance Adaptive Filter [OJW03]: Initialization Phase [l1,u1] Query Processing Unit Filter Bounds Data Stream 1 [l2,u2] Constraint Assignment Unit Data Stream 2 Answer tolerance is met as long as no update is generated [l3,u3] Data Stream 3 Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Approximate Answer Corrected Approximate Answer Update (v2>u2 or v2 < l2) User-defined Tolerance New Filter Bound Request Value v3 Adaptive Filter: Maintenance Phase [l1,u1] Query Processing Unit Data Stream 1 (v1) [l2,u2] [l2,u2] Constraint Assignment Unit Data Stream 2 (v2) Tolerance violated! trigger Maintenance Phase [l3,u3] Data Stream 3 (v3) Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Contributions Apply filter bounds to rank-based / non-rank-based queries subject to rank-based / fraction-based tolerance to reduce message costs Correctness proofs, cost analysis and experimental evaluation of each protocol Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Filter Bound Protocols FT-NRP RTP FT-RP ZT-RP ZT-NRP Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Ordered Values Range = [10, 30] Non-Rank-based Queries Answer Set Example: 1D Range Query S6 S5 S3 S2 S1 S4 S7 S8 2 6 11 14 23 25 34 41 Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Update Update Ordered Values Range of Q = [l, u] Fraction-based Tolerance False Positive False Negative S6 S5 S3 S2 S1 S4 S7 S8 Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Answer actually returned A(t) Fraction-based Tolerance E+(t) |A(t)|-E+(t) E-(t) True answer at time t = |A(t)| - E+(t) + E-(t) Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Initialization Phase • Given ε+and ε- • Collect current stream values • For streams satisfying the range query • Calculate no. of streams (Emax+) that can be false positives • Assign false +ve filters [-∞, + ∞] to Emax streams • Assign [l,u] to remaining ones • For streams failing the range query • Calculate no. of streams (Emax-) that can be false negatives • Assign false -ve filters [+∞, +∞] to Emax- streams • Assign [l,u] to remaining ones • Tolerance is satisfied if no new updates are received • At any time t without update, • F+(t) ≤ + • F-(t) ≤ - Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Range of Q = [l, u] Maintenance Phase: Good Update time tc time t0 S6 S5 S3 S2 S1 S4 S7 S8 Filter [l,u] • Insert S7 into A(tc) • F+and F-drop • F+(tc) < F+(t0) ≤ + • F-(tc) < F-(t0) ≤ - • Tolerance is met Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Range of Q = [l, u] Maintenance Phase: Bad Update time tc time t0 Filter [l,u] S6 S5 S3 S2 S7 S1 S4 S8 • Remove Si from A(tc) • F + (tc) ≤ + and F - (tc) ≤ - may not be true • Quality of answer becomes worse • Procedure Fix to maintain tolerance Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Range of Q = [l, u] Fix: Consulting False Positive Filter Filter [-∞, +∞] S6 S5 S3 S2 S4 S7 S8 S1 • Select stream S4A(tc) with [-∞, +∞] filter • Request S4 for its updated value • If V4[l, u] • install [l, u] filter to S4 • prove thatF +(tc) ≤ + and F - (tc) ≤ -are satisfied • If V4 [l, u], consult a false –ve filter • Worst case: 5 messages Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Filter Bound Protocols for Rank-based Queries • k-NN query is a representative of NN, Min, Max • Fraction-based tolerance / k-NN query • View a k-NN query as a range query, by using the kth nearest neighbor as the “range” • Adapt fraction-based tolerance/range query • Rank-based tolerance / k-NN query • Maintain knowledge about (k+r)th and (k+r+1)st item • Filter bound is defined by the average of the (k+r)th and (k+r+1)st item Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Experiments • Compare • No filter is used at all • Filter protocols with zero tolerance • Our tolerance-based protocols • Measure total no. of messages required for executing a continuous query Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Experimental Setup • Real Data • 30 days of wide-area traces of TCP connections based on TCP trace [ITA20] • Synthetic Data • Generated by CSIM 18 • Data value: Uniform distribution • Fluctuation of updates: Normal distribution • Interarrival time of updates: Exponential distribution Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Fraction-based Tolerance for Range Query with Real Data Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Fraction-based Tolerance for Range Query with Synthetic Data Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Conclusions • Value-based tolerance can be difficult to specify for continuous queries in stream systems • Rank-based and fraction-based tolerance • Applied to rank- queries and non-rank- queries • Filter bound protocols translate non-value- tolerance to filter bounds • Experiments illustrate protocol effectiveness Please contact Reynold Cheng (csckcheng@comp.polyu.edu.hk) for details Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Issues of Running Out of Filters • If all false positive and false negative filters run out, the system degrades to one in which no tolerance is exploited • To improve performance, initialization phase may be executed again • Experiments over long-running queries Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
Long-Running Queries Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance
False +ve / -ve Filters Selection Heuristic Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance