330 likes | 486 Views
Event Stream Processing with Out-of-Order Data Arrival. Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu , Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007 , June 29 th 2007, Toronto ON Canada. Outline.
E N D
Event Stream Processing with Out-of-Order Data Arrival Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu, Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007, June 29th 2007, Toronto ON Canada
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Introduction: Event Stream Processing • Raising interest in the database community • Wild-range and growing applications Example of Event Stream Processing: Shoplifting in Retail Management
Introduction:Complex Event Processing (CEP) • Event Stream Processing Engine • Stream engine specific for event stream query: generic for detecting and extracting expected pattern sequence • Performance gain compared to stream system using joins to handle event sequence query SASE Approach
Introduction:Limitations • Total Order Assumption in event arrivals • Order in which the events are received by the query system is the same as their timestamp order • By this assumption, “later arrival” means “larger timestamp” • What if Out-of-Order? • Out-of-Order data arrival is common in distributed computing environment (i.e., due to network traffic) • Systems based on total order assumption (i.e. SASE) miss qualified results and produce spurious results
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
WD: D.ts – A.ts < 10 secs ( ts:timestamp ) SC: (A,B,D) SSC SS: (A,B,D) PSSC: W = 10 secs Input Event Stream Preliminary:Query Language EVENT <event pattern> [WHERE <qualification>] [WITHIN <window>] Example: EVENT SEQ (A, B, D) WITHIN 10 seconds Queries in SASE assume above language structure
Preliminary:Finding Result Sequences • SSC (Sequence Scan and Construction) Sequence Scan: employs an NFA to detect matches Sequence Construction: constructs expected results • NFA with AIS (Active Instance Stack) AIS associates a stack with each state of the NFA storing the events that triggered the NFA transition to this state • RIP(Most Recent Instance in Previous Stack) field The field records the temporal order relevant to the query
Preliminary:Finding Result Sequences (Cont.) • Example EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 [] a3 [a3] b6 [b6] d10 a3 b6 d10 [] a7 [a7] b11 [b11] d15 a3 b6 d15 a3 b11 d15 a7 b11 d15 WD [] a16 S1 S2 S3 b f f a… b a c b a d f c d 1 11 3 5 6 7 10 12 13 15 Timestamp 16 18 18…
Preliminary:Purging Operator States • Example EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 PSSC: You see d15 Purge a3 and so on () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11 S1 S3 S2 b f f a… b a c b a d f c d 1 11 3 5 6 7 10 12 13 15 Timestamp 16 18 19…
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Problem with Out-of-Order at SSC:Incomplete Event Retrieval EVENT SEQ(A, B, D) WITHIN 10 Seconds SSC Missing Result b a c b a d f c d a f b f d 11 3 5 6 7 10 12 13 15 1 0 16 18 2 Received Order Out-of-Order Event Arrival * * Produced Result Correct Result A B D 0 1 2 3 a3 b6 d10 a7 b11 d15 a0 b1 d2 a3 b6 d10 a7 b11 d15 Missing! () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11
Problem with Out-of-Order at SSC:Event Misplacement Produced Result Correct Result a3 b6 d8 a3 b11 d8 a3 b6 d8 [] a3 [a3] b6 [b6] d10 [] a7 [a7] b11 [b11] d15 Wrong! [b11] d8 Missing! S1 S2 S3 Incorrect AIS Appending b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival
Problem with Out-of-Order at PSSC Purge in SS You see d15 then purge a3 and so on After that, OOO d8 comes Missing Result! unauthorized AIS purge CLAIM : Any data purge of active instance stack (AIS) is unauthorized unless total order on the data arrival holds for the input stream EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11 a3 b6 d8 S1 S3 S2 b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival Example 3 If precise query result is required, and memory resources is limited, WD in SS would not be sufficient for handling Out-of-order event arrival!
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Solution in SSC • Event Retrieval Mechanism To avoid incomplete retrieval, all states of the NFA need to be set active before the retrieval over the event stream. b a c b a d f c d a f b f d 11 3 5 6 7 10 12 13 15 1 0 16 17 2 Received Order Out-of-Order Event Arrival * * Produced Result A B D 0 1 2 3 a0 b1 d2 a3 b6 d10 a7 b11 d15 … () a0 (a0) b1 (b1) d2 () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11
Solution in SSC (Cont.) • AIS Construction Mechanism For avoiding event misplacement, use sort semantics instead of append semantics a3 b8 d10 a7 b8 d10 a3 b8 d15 a7 b8 d15 [] a3 [a3] b6 [] a7 [b8] d10 [a7] b8 [a7] b11 [b11] d15 S1 S3 S2 Correct AIS Appending b b f f a c b a d f c d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival
SSC Algorithm with Out-of-Order Handling Out-of-Order Handling Incorporated SSC: • Input: (1) Sequence Query “EVENT SEQ (E1, E2, …, Em) WITHIN W”; (2) AIS constructed from previously input events; (3) newly received event ei (under event type Ei) • Output: (1) updated AIS; (2) sequence output of SSC • 1. IF event type Ei is among {E1, E2, …, Em} • 2. insert ei into stack Si (using “sort semantics”) • 3. set ei’s RIP • 4. check the RIP values of the instances in stack Si+1 and reset the ones being affected by ei • 5. produce event sequences containing ei if any
Optimization Out-of-Order Handling Incorporated SSC with AIS_CLOCK: • Input and output: Same as Algorithm 1 • 1. IF event type Ei is among {E1, E2, …, Em} • 2. IF ei.timestamp < AIS_CLOCK • 3. buffer ei • 4. insert ei into stack Si (using “sort semantics”) • 5. set ei’s RIP • 6. check the RIP values of the instances in stack Si+1 and reset the ones being affected • 7. produce event sequences containing ei if any • 8. ELSE • 9. buffer ei • 10. insert ei into stack Si (using “append semantics”) • 11. set ei’s RIP • 12. IF Ei = Em • 13. produce event sequences containing ei if any
SEQ(A, B, D) Purge when f18 is met 18 > 3 + 10 + 4 W = 10 K = 4 [] a3 [a3] b6 [] a7 [a7] b8 [b8] d10 [a7] b11 [b11] d15 Solution for PSSC • Using K-Slack We apply K-Slack based on time units. It assumes that the out-of-ordering in event arrivals is within a range of k time units. That is, an event can be delayed for at most k time units. a3 b6 d8 b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order
Purge condition: ei.timestamp + W + K < CLOCK (After waiting for K time units, no out-of-order event with timestamp less than ei + W can arrive. Thus ei will no longer be able to contribute to forming a new candidate event sequence) • CLOCK: Its value equals to largest timestamp seen so far from the received events is maintained.
PSSC Algorithm With Out-of-Order Handling • Out-of-Order Incorporated SSC Purge (PSSC): • Input: (1) current AIS; (2) CLOCK triggering from SSC • Output: updated AIS • 1. On receiving a CLOCK triggering • 2. for event instance e in AIS • 3. IF e.timestamp + W + K < CLOCK • 4. purge e
Optimization 1: AIS partition We can divide each stack in AIS into two parts: outdated event instances(e.timestamp + W + K > CLOCK ) up-to-date event instances. (e.timestamp + W > CLOCK) SEQ(A, B, D) W=7 K=10 (large) SSC output when d13 comes Cost ! a3 b5 d18 a3 b5 d18 a3 b11 d18 a7 b11 d18 … [] b1 [] a3 [] a7 [a3] b5 [b5] d10 divider [a7] b11 [b11] d18 S1 S3 S2 b c b a c b a d f f d f 11 3 4 5 7 10 12 18 1 13 18 15 Received Order Out-of-Order Event Arrival
Optimization 2: Lazy Purge For each CLOCK update, only the instance in the last AIS stack will be checked for data purge. For any instance is purged from there, we can purge instances in other AIS stacks following the RIP path. [b6] d10 [ ] a3 [a3] b6 [b11] d15 [ ] a7 [a7] b11
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Experiment 1:Sequence Scan and Construction (SSC) SEQ (A, B, C, D, E, F)) CPU gain on applying the AIS_CLOCK Out-of-order data percentage is 90% Y axis cost: Inserting events and resetting RIP
Experiment 2: Applying AIS partition during the SSC purge Performance Gain On Memory Performance Gain on CPU cost
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Conclusion • In this work, we address the problem of processing event stream with out-of-order data arrival: • we analyze the problems state-of-the-art event stream processing technology would experience when faced with out-of-order data arrival • we propose new implementation and optimization strategies for the core stream algebra operators • we conduct an experimental study that clearly demonstrates the effectiveness of our proposed approach over existing solutions
Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work
Related Work • Some initial work uses K-slack to investigate the out-of-order problem for homogenous-input stream systems • Aurora deals with out of order within operator-level Order-sensitive operators wait a certain period of time before closing each window • Cayuga system deals with out-of-order by waiting K time unite before all the processing, which has higher latency then ours • Stream punctuation confirms that a certain value or time stamp will no longer appear in the future input streams. It requires certain service to first be created and appropriately associated