300 likes | 555 Views
Lewis Sykalski. A Case Study In Reliability Analysis. Background (cont.). Net Centric Warfare Data Collector Approximately 180KLOC Written in Java and heavily uses JDBC and RMI from J2EE package CMMI Level 1 Utilizes Oracle 9.2 EE OTS DBMS Reliability Required: Moderate. Background.
E N D
Lewis Sykalski A Case Study In Reliability Analysis
Background (cont.) • Net Centric Warfare Data Collector Approximately 180KLOC Written in Java and heavily uses JDBC and RMI from J2EE package CMMI Level 1 Utilizes Oracle 9.2 EE OTS DBMS • Reliability Required: Moderate
Background GLOBAL VISION NETWORK (GVN) CAOC FUSION DC VBMS LM – Mission Sys Colorado Springs, CO DC WCS JSAF JTAC Light House Suffolk, VA JIMM VBMS JABE Other Simulators Threat Sims Integrated Warfare Development Center Fort Worth, TX LM – Sim & Training Orlando, FL
Design Diversity (Part I) • Part I: Oracle DBMS Design Diversity • Acquire 20 bug reports each from Oracle 9.2 & Oracle 10.0 • Bugs had to be Date Independent, Easy To Reproduce, & Type Independent • Results would then be classified by self-evidence & divergence
Oracle 9.2 Oracle 10.0 Oracle 10.0 Oracle 9.2 Total Bug Scripts 20 - 20 - Failure Observed 20 - 20 11 Performance/ Hang S.E 2 0 1 0 Internal Error S.E 11 0 10 6 Engine Crash S.E 0 0 2 2 Incorrect Result S.E 0 0 0 0 N.S.E 7 0 6 2 Other S.E 0 0 1 1 N.S.E 0 0 0 0 Design Diversity: More Analysis
Design Diversity: Even More Analysis Bottom Line: • Not a Statistical Sample (Not Enough Time) • 2/40 = 10% of Failures not detected across both products • Out of the 20 failures for Oracle 10.0, 6 were N.S.E & 4 out of 6 of these failures would be resolved by utilizing a past release in tangent with future release
Reliability Analysis (Part II) • Part II: CASRE Reliability Analysis of NCW Data Collector • Extract the following from Failure Logs using JavaScript: Time of Program Start, Time of Program Termination, Time of Thread Terminations, and Exception or Failure Messages • Parse failures manually into CASRE input format • Categorize by severity utilizing chart on next slide • Compare 2 consecutive events (CALOE08 & MAGTF08) as well as 2 consecutives lifecycles within same event (Integration & Execution)
CASRE Input Format TIME BETWEEN FAILURES FORMAT: N/A FAILURE COUNT FORMAT Interval Number of Interval Error Number Errors Length Severity (int) (float) (float) (int) Example: Hours 1 5.0 40.0 1 1 3.0 40.0 2 1 2.0 40.0 3 2 4.0 40.0 1 2 3.0 40.0 3 3 7.0 40.0 1 4 5.0 40.0 1 5 4.0 40.0 1
CASRE Failure Counts CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Time Between Failures CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Failure Intensity CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Cummulative Failures CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Test Interval Length CALOE+MAGTF Execution MAGTF Integration + Execution
Detecting Reliability Trends • Running Average: • Not as Useful for Failure Count Data (unless test intervals are equal length) • Computes the running average of the time between successive failures for time between failures data, or the running average of number of failures per interval for failure count data. • If the running average decreases with time (fewer failures per test interval), reliability growth is indicated. • Laplace Test: • Not as Useful for Failure Count Data (unless test intervals are equal length) • Occurrences of failures = homogeneous Poisson process • If the test statistic decreases with increasing failure#, then the null hypothesis can be rejected in favor of reliability growth at an appropriate significance level. Opposite for increases with increasing failure#
Running Average CALOE+MAGTF Execution MAGTF Integration + Execution
Laplace Test CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Cum Failure Predictions CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Prediction Setup CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Reliability Prediction CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Prequential Likelihood CALOE+MAGTF Execution MAGTF Integration + Execution
CASRE Model-Ranking CALOE+MAGTF Execution MAGTF Integration + Execution
Reliability Models • Haven’t been able to get these to run yet. • Instruction manual says many of the built-in models only work with Time Between Failures Data. • Doubt there would be much utility with Failure Count Data
Conclusion/Follow-Up • It actually would be QUITE easy to integrate Failure Count or Time Between Failures Output Auto-Generation into my environment • This would facilitate quick trend-analysis • Reliability trends and not the actual numbers is what is important