200 likes | 289 Views
Nihal Dindar , Peter M. Fischer, Merve Soner , Nesime Tatbul ETH Zurich, Switzerland . Efficiently Correlating Complex Events over Live and Archived Data Streams. What is a Pattern Correlation Query (PCQ) ? . Upon detecting a fall in the current price of stock X on the live stream,.
E N D
NihalDindar, Peter M. Fischer, MerveSoner, NesimeTatbul ETH Zurich, Switzerland Efficiently Correlating Complex Events over Live and Archived Data Streams
What is a Pattern Correlation Query (PCQ) ? • Upon detecting a fallin the current price of stock X on the livestream, • look for a tick-shaped pattern for X within recent archive Price fall pattern (live match) recencyregion tick-shaped pattern (archive matches) Time
PCQ = Live Archive • Fall pattern on live stream: • PATTERN(A+) • DEFINE A AS A.Price < PREV(A.Price) • Tick-shaped pattern on archive stream: • PATTERN(A+B+) • DEFINE A AS A.Price < PREV(A.Price) B AS B.Price > PREV(B.Price) AND LAST(B.Price) > FIRST(A.Price) • Correlation Criteria • WHERE symbol_l = symbol_a • RECENCY = 10 minutes
Challenges • A clean, useful, optimizable semantics for PCQ • Needed definitions: archive of an event, recency e.g., • Efficient access and processing of fast growing archive data • Optimized processing of high-cost complex pattern matching queries to achieve scalability with potentially high live stream rates
Related Work • Pattern matching systems for live streams • Academic: Cayuga, SASE+, ZStream • Commercial: Coral8, ESPER, Oracle CEP, StreamBase • Systems which combine live and historical data • Moirae, NiagaraST/Latte, TelegraphCQ • Summary: either live pattern matching or combined processing of live and historical data, but not both
Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work
Modeling PCQs Event ahappens before(->) event bif astarts before b starts and ends before b ends. A stream is totally ordered based on start and then end time of its events. Price fall pattern (live match) tick-shaped pattern (archive matches) recencyregionsize = P An event has start and end time. An event b has recency correlation with an event a, where a->b and a’s start time is inside b’srecency region. Time
Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work
Baseline PCQ Processing Strategy : The Lazy Approach Step 1: Look for live matches Step 2: Calculate the recency region Step 3: Look for archive matches Step 4: Apply the join condition and Join the live and archive matches Price fall pattern (live match) recencyregion tick-shaped pattern (archive matches) Time
Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work
Optimizing PCQs - Recent Input Buffer • is an in-memory data structure that mediates between live and archived event stores • caches the most recent stream tuples for efficient access • provides bulk inserts into the stream archive
Optimizing PCQs - Query Result Caching • caches archive matches in order to avoid recomputing them for overlapping regions 1 2 3 Live Stream 1 2 3 4 5 Archive Stream Recency Region P Query Result Cache 3 2 1 5 4 Archive matches are retrieved from the Query Result Cache
Optimizing PCQs - Join Source Ordering • Selectivity Criteria: to process the more selective pattern first • Processing Cost Criteria: to avoid the processing of hot spots Recency region for archive first 1 2 Recency region for live first 1 Recency region for live first Live Stream Archive Stream
Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work
Experimental Results • Data: January 26 to 31, 2006 of stock-market data from NYSE • Query: (live pattern: fall), (archive pattern: tick-shaped) • Stock : Exxon Mobile (XOM), P covers several hours baseline
Summary of Experimental Results • PCQs are expensive • Optimization pays off • Our optimizations provide big improvement baseline
Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work
Conclusions • We have investigated the problem of efficiently correlating complex events over live and archived data streams, providing: • an optimizablesemantics for Pattern Correlation Queries • Recent input buffering to deal with different access speed of live and archive data • Query result cache & join source ordering to reduce the quadratic complexity of PCQ processing for scaling with high stream rates
Future Work • Optimizations for response time • Indexes on result cache • Introduction of other correlation criteria such as context similarity, temporal periodicity.