240 likes | 372 Views
Application of Maximum Entropy Principle to software failure prediction. Wu Ji Software Engineering Institute BeiHang University. Agenda. Introduction Problem and focus Method and models Results Conclusions. Introduction.
E N D
Application of Maximum Entropy Principle to software failure prediction Wu Ji Software Engineering Institute BeiHang University
Agenda • Introduction • Problem and focus • Method and models • Results • Conclusions
Introduction • Failure prediction is one of the key problems for software quality (reliability) estimation. • Generally, failure prediction can be defined as y = f(x). • y is failure related variable • x is the foundation on which prediction works • As far as we know, x has been set as: • Software execution time reliability growth prediction • Software execution trace anomaly detection
Introduction (cont.) • Reliability has been a big concern for high reliability requirement (HRR) software. • Reliability engineering has very high cost. Reliability testing is seldom done for the software without HRR. • Anomaly detection is usually implemented as a built-in module of software.
Introduction (cont.) • Generally, all managers are striving for high quality. • What does manager really care for failure prediction? • Given an usage scenario, if software can survive? • How to predict software failure from input is still a new problem.
Problem and focus How to predict failure from software input?
Problem and focus (cont.) execution start s left context … failure observation = ? (0/1) t execution time line
Failure law Problem and focus (cont.) • If we can model the left context, we get the distribution {(lc, fo)}. Software input {(lc,fo)} Failure Prediction Failure Learning Failure observation
Method and models • The whole left context is hard to model. • A probability model: po(y|x) • x: partial left context, y: failure observation. • Maximum Entropy Principle (MEP) is applied to model the po(y|x).
Method and models (cont.) • MEP is a well-known and widely used learning principle: • Great generalization ability • Dynamic and open • Good adaptive with data sparseness
Method and models (cont.) Failure can be well modeled only from input, and its relations with failures. Failure cannot be well modeled without modeling fault. Surface Model Structure Model Surface Viewer Structure Viewer
Method and models (cont.) • Surface Model: learns the statistical co-occurrence of the surface information. • Structure Model: learns the statistical cause-effect (fault-failure) relationship.
Method and models (cont.) The features applied in the surface model Failure-Ftrs Flr SIU-Num-Ftrs SIU-Seg-Ftrs
Method and models (cont.) The features applied in the structure model Failure-Ftrs Flr (Flt -> Flr) Ftrs Fault-Ftrs
Method and models (cont.) • Supervised training • Training data • Objective: maximize the likelihood function.
Method and models (cont.) • Models Evaluation: • For a given test case: • Test engineer would run it and get the test_fo_sequence; • The prediction model would return the predicted pred_fo_sequence. • Evaluate by the match degree (precision) between test_fo_sequence and pred_fo_sequence.
Results • Two groups of experiments, totally 5 software involved in, 17 testing. • Open test method • Testing data keeps separate with training data and keeps unknown for training. • Surface Model: average precision: 0.876 • Structure Model: average precision: 0.858
Results (cont.) Evaluation Score Distribution
Results (cont.) • Potential applications of the prediction model • Test case prioritization • Reliability Estimation • Reliability Growth Modeling
Conclusions • A new failure prediction problem • Apply statistical learning method to learn failure law and then predict failure • Two models, surface model and structure model • Promising evaluation results: • Surface Model: 0.876 • Structure Model: 0.858.
Conclusions (cont.) • Lessons learnt: • To design and start experiments ASAP to verify model. • Complex model does not always perform well. model simplification. • DO NOT draw much assumption on the generation of data.
Thank you for the attentions Ready for questions!