90 likes | 208 Views
Experimenting with Complex Event Processing for Large Scale Internet Services Monitoring. Stephan Grell, Olivier Nano Microsoft, Ritter Strasse 23, Aachen, 52072, Germany Tel: +49 241 99784 533, Fax: +49 241 99784 77 { stgrell , onano }@ microsoft.com. Overview. Agenda:
E N D
Experimenting with Complex Event Processing for Large Scale Internet Services Monitoring Stephan Grell, Olivier Nano Microsoft, Ritter Strasse 23, Aachen, 52072, Germany Tel: +49 241 99784 533, Fax: +49 241 99784 77 {stgrell, onano}@microsoft.com
Overview • Agenda: • SLA monitoring System • Scenarios • Language expressiveness • Offline analysis • Reliability
European Microsoft Innovation Center (EMIC) Overview • Founded in May 2003 (under Craig Mundie) • ~40 employees + students • Goals: • Applied collaborative research with European partners (BT, Philips, etc…) • Participating in FP6, FP7 and other colaborative projects • Generating strong prototypes to drive interest at MS • Some internal projects for MS (port of CF for Symbian)
SLA monitoring System Developed as part of the FP 6 SeCSE project
Scenarios S1: Syntactic transactions S2: user generated events Monitor local service instances Aggregate on higher level Per service role Over service roles / per service Requirements: Distributed CEP system Capacity management High Availability • Test applications ping service functionality regularly • SLA evaluates success, response time and failure states • The system takes appropriate actions depending on the state • Requirements: • Single node CEP system • Pattern detection • state modeling Support for on the fly query adaptation and root cause analysis
Language expressiveness • Detecting patterns? • Over available data • Over available data with temporal constraints • Building state machines? • Needed: a simple way to formulate a state machine Question: How to enable a none expert to use the tools?
Offline analysis / debugging • Required for debugging processing plans • CEP simulation environment • Automated event generation based on the query • Step by step execution of the query • Conditional break point setting • Smart logging at runtime • Only required traces are stored of the query in question • Only the data is stored that issued a “bad” result • Support for building the right query from the available data
Reliable Infrastructure • Survive failures: High availability • Replication • Distributed storage • Correct output - How to compare outputs? • Deal with overload scenarios • Intelligent load shedding v. delayed execution Question: what is the required “quality of service”
Next Steps • Engaging in new scenarios • Development focuses on • High Availability • Debugging / Root Cause analysis • Explore heterogeneous CEP systemthat spans • Servers • Embedded devices • Sensors • The cloud?