130 likes | 155 Views
Real Time Streaming Pattern Detection for eCommerce. AUTHORS - William Braik , Floréal Morandat , Jean-Rémy Falleri , Xavier Blanc PRESENTED BY KRITI NARSAPUR (Student id: 1294630). contents. Introduction Background Pattern Detection Evaluation Conclusion. introduction.
E N D
Real Time Streaming Pattern Detection for eCommerce AUTHORS - William Braik, FloréalMorandat, Jean-Rémy Falleri, Xavier Blanc PRESENTED BY KRITI NARSAPUR (Student id: 1294630)
contents • Introduction • Background • Pattern Detection • Evaluation • Conclusion
introduction • Pattern detection over event streams • Challenges of real time pattern detection • Efficiency • Scalability • Existing approach – measure web traffic in batch fashion
Introduction contd.. • Experimented approach: • Domain Specific Language (DSL) – express customers’ behaviours • DSL semantics – compilation process transforms patterns into Deterministic Finite Automata (DFAs) • Spark – Big Data streaming platform – run pattern detection algorithm in real time • cDiscount Requirement: • Handle customers’ behaviours detection – 1million customers send around 400 events each day – latency < 1 sec
BackgroundCdiscount architecture • Event of stream, e = (t, d)
Pattern detection:A DSL to express behaviour pattern • Patterns: sequences of events. • Event is matched according to its action type • DSL also supports complement of action type • 2 non-contiguous operators, that ignores all events that do not match the pattern : • FollowedBy • KleenePlus+ • Time constraints : Interval (operator) and Window (pattern) • Data constraint • Negative Acceptation Condition (NAC)
Pattern detection:From patterns to automata • DFAs are used to detect patterns (NFA is just used for representation) • Translate each pattern to corresponding DFA • Run DFA for each customer • Memory usage is proportional to number of simultaneous customers * number of patterns to detect • 2 step transformations: • Generate NFA • Convert NFA into corresponding DFA • Run the pattern detection using Spark P : View + Exit
evaluation • Goal – assess whether a given cluster of machines, with a set pool of memory and CPU resources is capable of detecting patterns efficiently • Total number of automata that this engine runs, A = C * P • C : number of simultaneous customers handled by the system • P : number of patterns to be observed • Throughput of events, T = C * E • E : number of events • Measure maximum value of A and T that are supported by given cluster
Evaluation contd..protocol • Maximum of A – simulate creation of new automata, until performance criterion is not met. • Maximum of T – generate stream with a given throughput and check how system performs under stress • Phase 1 – Creates as many automata as needed to reach A • Measure memory footprint • Phase II – Keeps system working, without increasing number of automata in memory • Measure maximum detection latency
Evaluation contd..results • Run total 30 configurations: • T ϵ {1000, 2500, 5000, 10000, 15000, 20000, 30000, 35000, 40000} • A ϵ { 0.5M,1M,2M } • Phase 1 – Latency Detection increases • Phase II – Curve stabilizes T = 5000 events per second
conclusion • This study provided • DSL – expressing behaviour patterns • Compiler – translate them into DFA • Detection engine • Experimental results showed that, for 5000 events per second, it can handle: • 1million customers with subsecond detection latency • 2millions with latency lower than 2 seconds