160 likes | 329 Views
Model-based Validation of Streaming Data. Cheng Xu, Tore Risch Dept. Information Technology Uppsala University, Sweden Daniel Wedlund, Martin Helgoson AB Sandvik Coromant, Sweden. Talk Overview. Motivation Approach and System Architecture Demonstrators Performance experiments
E N D
Model-based Validation of Streaming Data Cheng Xu, Tore Risch Dept. Information Technology Uppsala University, Sweden Daniel Wedlund, Martin Helgoson AB Sandvik Coromant, Sweden
Talk Overview • Motivation • Approach and System Architecture • Demonstrators • Performance experiments • Conclusion • Related work • Future work
Motivation • Functional products: integrated provision of hardware, software and services, not just the traditional hardware => Manufacturer responsble for functioning • In modern manufacturing industry sensors installed on equipment-in-use generate many high rate data streams • Providing productivity, reliability, and quality of functional products require monitoring many streams for unexpected behavior. • When the number of machines increases and data flows are high, validation with low latency may be challenging • SVALI (Stream VALIdator): General system to validate correct equipment behavior by analyzing streams on-the-fly.
SVALI, Stream VALIdatorTwo validation approaches: • Model-and-validate • The user defines an analytical math modelof expected behavior based on streamsfrom equipment sensors • The user also defines a validation model that identifies abnormal equipment sensor readings by comparing the result of the analytical model with measured sensor streams. • A simple case is detecting when difference between expected power consumption and measured power consumption exceeds some threshold. • Learn-and-validate • The user provides (statistical) learning model based on a sampled sub-stream of correctly behaving equipment • As for model-and-validate the user also provides a validation model
SVALI Architecture CLIENT UPDATES VISUALIZERS AND ALERTERS CQ 2 CQ 1 EPIC DSMS Stream wrapper A Stream wrapper B model-n-validate learn-n-validate SVALI VALIDATION FUNCTIONS set threshold = 1.3 Analytical model Statistical model STREAM MODELS DB STREAM WRAPPERS TCP TCP equipment A equipment B
SVALI Validation functions • Model-and-validate • model_n_validate(Bag of Stream s, Function modelfn, Function validatefn) ->Stream of (Number ts, Object me, Object ex) • modelfn(Object se)->Object ex • validatefn(Object se, Object ex)->(Number ts, Object me) • Learn-and-validate • learn_n_validate(Bag of Stream s, Function learnfn, Integer n, Function validatefn) -> Stream of (Number ts, Object me, Object ex) • learnfn(Vector of Object sa)->Object ex • validatefn(Object se, Object ex)->(Number ts, Object me) The difference is how the model is defined
Model-n-validate demonstrator The analytical and validation models are entered into the SVALI system create function validatePower(Record r, Number ex) -> (Number ts, Number me) as select ts(r), me where me = measuredPower(r) and abs(ex - me) > th(“mill1”); select model_n_validate(bagof(input), #'expectedPower',#’validatePower’) from Stream input where input = corenetJsonWrapper("h1", 1337); The side milling process
Learn-n-validate demonstrator • Cyclic behavior is defined as predicate (dynamic) windows. • A vector of expected power consumptions is computed from the sampled n first predicate windows • The learning model is the normalized average vector over the sampled windows • Validation is done by comparing the normalized euclidean distance between the learnt power consumptions and the current window’s power consumptions create function cycleStart(Record s) -> Boolean as s[“trigger”] = 1; The window starts when the trigger is 1 create function cycleStop(Record s, Record r) -> Boolean as r[“trigger”] = 0 and s[“trigger”] = 1; The window ends when the trigger is 0 and the window was started select learn_n_validate(bagof(sw), #’learnCycle’, 2, #’validateCycle’) from Stream s, Stream sw where s= corenetJsonWrapper( "h2", 1338) and sw = pwindowize(s, #’cycleStart’, #’cycleStop’); create function extractPowerW(Window w) -> Vector of Number as vselect extractPower(r) from Record r where r in w; create function learnCycle(Vector of Window f) -> Vector of Number as navg(select extractPowerW(w) from Window w where w in f); create function validateCycle(Window w, Vector e) -> (Number ts, Vector of Number m) as select timestamp(w), m where neuclid(e, m) > th(“machine2”) and m = extractPowerW(w); Cyclic behavior
Performance Experiments • Experiment setup • Dell NUMA computer PowerEdge R815 featuring 4 CPUs with 16 2.3 GHz cores each. OS: Scientific Linux release 6.2 • The performance of SVALI is measured by average response time of two queries • Q1, model-and-validate over single stream events • Q2, model-and-validate moving average over 0.1 second stream windows • To scale-up the number of machines, streams are generated based on real data streams provided by industrial partner with different arrival rates (1 ms – 10 ms), each stream is tagged with a machine id.
Performance Experiments • Central vs Parallel validation0 validationi machine0 machine0 machinei machinei ... ... ... ... validation merge on ts merge on ts one SVALI node parallel validation central validation
Experiment Measurement Q1 validation0 validationi machine0 machine0 machinei machinei ... ... ... ... validation merge on ts merge on ts one SVALI node Fig. 1 Average response time Q1
Experiment Measurement Q2 validation includes a groupby on machine id validation0 validationi machine0 machine0 machinei machinei ... ... ... ... validation merge on ts merge on ts It is already grouped one SVALI node around 2 ms Fig. 2 Average response time Q2
Conclusion • Two general validation approaches were presented to validate stream behaviors, called model-and-validate and learn-and-validate • Two demonstrators show how they are used in real industrial application streams • Parallel execution enables computation of stream validation with limited delays over many machines
Related work • Jakubek, S. and Strasser, T.: Fault-diagnosis using neural networks with ellipsoidal basis functions. American Control Conference. Vol. 5. pp.3846-3851, 2002 • Learning algorithm to reduce the number of measurements for fault detection, while we use parallel processing to enable low delays • Tan, T., Gu, X., and Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. PODC Conf., 2010 • Prediction instead of detection • Low arrival rates, e.g. one sample every 2 seconds, need not parallelization • Wang, D., Rundensteiner, E., Ellison, R.: Active Complex Event Processing for Realtime Health Care, VLDB Conf., 3(2): pp.1545-1548, 2010 • Lower level rule mechanism triggered by state changes during the continuous query process • Zeitler, E. and Risch, T.: Massive scale-out of expensive continuous queries, Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 4, No. 11, pp. 1181-11888, 2011 • SVALI’s underlying DSMS EPIC extends that work with e.g. sliding windows and incremental aggregation. • SVALI provides validation functionalities on top of EPIC
Future work • Other strategies for automatic performance improvements • Adaptive learning model by re-sampling • Adaptive parallelization of expensive validation functions