220 likes | 339 Views
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams. Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology). Outline. Border Monitoring Query (BMQ) BMQ-Index Experiments
E N D
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology)
Outline • Border Monitoring Query (BMQ) • BMQ-Index • Experiments • Related work • Conclusion
Data stream monitoring • ◀ Logistics • Management • Thief-proofing • Catalog • Advertisement ◀ Remote Medical Service GPSs • ◀ Disaster Prevention • Flood Warning • Earthquake Prediction • Building Monitoring • Traffic light control Continuous range queries Sensors Q1 : 10 < value Q2 : 11 < value < 13 ……. • ◀ Location-based Service • Tracking (Friends, Employee) • Vehicle Monitoring • Intelligent Transportation • ▲ Automatic Home • Automatic Ventilation • Automatic Temperature Control • Automatic Humidity Control Emerging Computing Environment Data stream 11 10 12 13 12 14
Motivating Service Scenario #1 • Stock trading sell sell Expensive !! ( > $640) Monitor stock data streams crossing the borders !! Cheap !! ( < $600) buy buy Time SAMSUNG stock price during 23 days from Nov. 16th to Dec. 23rd, 2005
Pet-Care Coming into Coupon Going out Motivating Service Scenario #2 • Location-based advertisement Monitor location data streams crossing the borders !! Send a special lunch menu to people within 1km during lunch time !!
RMQ (Region Monitoring Query) – Conventional continuous range query – It reports all matching data within a query range Border Monitoring Query • To monitor data streams crossing the borders • Essential concern in many practical applications • Users’ main interest • Useful to automatically trigger or stop relevant actions • BMQ (Border Monitoring Query) • A new type of continuous range query !! • It reports only data crossing the borders of a query range (= coming into or going out from the query range)
Problem: Scalability !! • A large number of BMQs can be issued • Millions of stock investors will register their own queries • Millions of stores will register their own queries + A huge volume of data streams are rapidly incoming + Fast response is also essential for users • How can we process BMQs over data streams efficiently? • (1) Naïve approach • Individual BMQ processing at each data update Lack of scalability !! • (2) Based on existing mechanisms for RMQ evaluation • Shared RMQ processing by indexing queries Costly post-processing !!
Solution Approach: BMQ-Index • Shared processing • By query indexing approach • BMQ-Index is built on registered BMQs • Upon a data arrival, only border-crossed queries are quickly searched for Achieves a high level of scalability !! BMQ-Index Q1, Q2 (border-crossed queries) 14 Data tuple Registered BMQs Q1: 10 < value Q2: 11 < value < 13 …….
Series of data tuples Solution Approach: BMQ-Index • Incremental processing • By incremental access method • Use previous search step for the next search Successive searches are significantly accelerated !! • Keep information only needed for incremental search Low storage cost !! BMQ-Index 10 12 13 12 Q1, Q2 (border-crossed queries) 14 Registered BMQs Locality of data streams !! Q1: 10 < value Q2: 11 < value < 13 …….
$30 $10 10 0 Q1 reasonable price range (unit: $) 25 0 Q2 5 20 Q3 30 15 Q4 35 45 Q5 Registered BMQs One-dimensional BMQ-Index(Example) Stream Table Notify me whenever the IBM stock price is coming into or going out from my reasonable price range !! Linked list ∞ 15 20 0 35 10 25 30 5 45 +Q2 +Q1 +Q3 +Q4 +Q5 Q4 Q1 Q3 Q2 Q5
Case 1) 21 23 No border-crossed query No node traversal • Case 2) 21 37 -Q2, -Q4,+Q5 Traverse BMQ-Index to the right • Case 3) 21 8 +Q3, -Q4,+Q1 Traverse BMQ-Index to the left Search Operation in One-dimension (Example) : previous data value (vt-1) : current data value (vt) 37 23 21 8 ∞ 15 20 0 35 10 25 30 5 45 +Q2 +Q1 +Q3 +Q4 +Q5 Q4 Q1 Q3 Q2 Q5 0 10 Q1 25 0 Q2 5 20 Q3 15 30 Q4 45 35 Q5
Multi-dimensional BMQ-Index Query Table Stream Table bY7 {} {Q2} RS-Y7 bY6 Q2 {} {Q3} RS-Y6 bY5 Q3 {} {Q1} v(s2) RS-Y5 bY4 Q1 {Q3} {} v1(s3) RS-Y4 bY3 {Q2} {} RS-Y3 v2(s3) bY2 {Q1} {} v(s1) RS-Y2 v3(s3) bY1 {} {} RS-Y1 bY0 +DQSet-Yi -DQSet-Yi RS-Y List bX0 bX1 bX2 bX3 bX4 bX5 bX6 bX7 RS-X List RS-X1 RS-X2 RS-X6 RS-X7 RS-X3 RS-X4 RS-X5 {} {Q3} {} {} {} +DQSet-Xi {Q1} {Q2} {Q1} {} {Q3} {Q2} {} {} -DQSet-Xi {}
Search Operation in Multi-dimension • Overall flow • Performance Analysis (d-dimension) • Search performance • (((d–1)d) one-dimensional search time) • Storage cost • (d one-dimensional storage cost) ±XQSet cross-check with Y-dimension RS-X list.search() ±XBMQSet xc QSet± Union (xc, yc) yc ±YBMQSet ±YQSet cross-check with X-dimension RS-Y list.search() Union of per-dimension results Per-dimension search Validation through cross-check
Experiments • Workload generation • Stock trading scenario (one-dimensional case) • Data stream generation (Korea stock market[9]) • Fluctuation level: 0.01% ~ 0.1% • 2000 stream sources, 1000 tuples in each stream • Query generation • Lower bound: randomly chosen (1 ~ 106 ) • Width of queries: 1 ~ 10 times larger than FL • Number of queries: 10,000 ~ 100,000 • Comparisons • An approach based on state-of-the-arts RMQ-Index (CEI[CIKM’05] and IS-list[Information System’96]) • Performance metrics • Average search time per data tuple (millisecond) • Index storage size (Mbyte)
Search performance • Effects of the number of queries • (W=0.1%, FL=0.01%) • Effects of the widths of queries(N=100000, FL=0.01%)
Storage cost • Effects of the widths of queries(N=100000) • Effects of the number of queries • (W=0.1%) • BMQ-Index: twice • IS-list: log (# of queries) times • CEI: all grids covered by a query range
Related Work • Semantics • CQL (Continuous Query Language developed by STREAM project) • General concept to transform a Relation to a Stream • BMQ is a specific class of continuous range query • Shared and Incremental Processing Generally not feasible for BMQs !!
Conclusion • Summary • Characterize a new type of continuous range query • Border Monitoring Query (BMQ) • Useful and practical in many emerging applications • One- and multi-dimensional BMQ-Index • Evaluates a large number of BMQs in a shared and incremental manner, thereby achieving excellent search performance and low storage cost
Thank you Question?
Performance Analysis • 1-dimensional BMQ-Index • Search performance • (2 NqFL) • Storage cost • (2Nq + Nd) • d-dimensional BMQ-Index • Search performance • (((d–1)d) 2NqFL), only 2 times when d=2 • Storage cost • (d(2Nq + Nd) + Nq) Nq = Number of queries Nd = Number of data streams
Cross checking • Algorithm • For +XQSet • check whether vtis located between the Y predicates • For –XQSet • check whethervt-1 is located between the Y predicates • YQSet is checked with X-dimension by a similar manner