A Case Study In Reliability Analysis

Lewis Sykalski A Case Study In Reliability Analysis

Background (cont.) • Net Centric Warfare Data Collector Approximately 180KLOC Written in Java and heavily uses JDBC and RMI from J2EE package CMMI Level 1 Utilizes Oracle 9.2 EE OTS DBMS • Reliability Required: Moderate

Background GLOBAL VISION NETWORK (GVN) CAOC FUSION DC VBMS LM – Mission Sys Colorado Springs, CO DC WCS JSAF JTAC Light House Suffolk, VA JIMM VBMS JABE Other Simulators Threat Sims Integrated Warfare Development Center Fort Worth, TX LM – Sim & Training Orlando, FL

Design Diversity (Part I) • Part I: Oracle DBMS Design Diversity • Acquire 20 bug reports each from Oracle 9.2 & Oracle 10.0 • Bugs had to be Date Independent, Easy To Reproduce, & Type Independent • Results would then be classified by self-evidence & divergence

Design Diversity: Results 9.2 Bugs

Design Diversity: Results 10.0 Bugs

Oracle 9.2 Oracle 10.0 Oracle 10.0 Oracle 9.2 Total Bug Scripts 20 - 20 - Failure Observed 20 - 20 11 Performance/ Hang S.E 2 0 1 0 Internal Error S.E 11 0 10 6 Engine Crash S.E 0 0 2 2 Incorrect Result S.E 0 0 0 0 N.S.E 7 0 6 2 Other S.E 0 0 1 1 N.S.E 0 0 0 0 Design Diversity: More Analysis

Design Diversity: Even More Analysis Bottom Line: • Not a Statistical Sample (Not Enough Time) • 2/40 = 10% of Failures not detected across both products • Out of the 20 failures for Oracle 10.0, 6 were N.S.E & 4 out of 6 of these failures would be resolved by utilizing a past release in tangent with future release

Reliability Analysis (Part II) • Part II: CASRE Reliability Analysis of NCW Data Collector • Extract the following from Failure Logs using JavaScript: Time of Program Start, Time of Program Termination, Time of Thread Terminations, and Exception or Failure Messages • Parse failures manually into CASRE input format • Categorize by severity utilizing chart on next slide • Compare 2 consecutive events (CALOE08 & MAGTF08) as well as 2 consecutives lifecycles within same event (Integration & Execution)

Severity

Using CASRE

Using CASRE (cont.)

CASRE Input Format TIME BETWEEN FAILURES FORMAT: N/A FAILURE COUNT FORMAT Interval Number of Interval Error Number Errors Length Severity (int) (float) (float) (int) Example: Hours 1 5.0 40.0 1 1 3.0 40.0 2 1 2.0 40.0 3 2 4.0 40.0 1 2 3.0 40.0 3 3 7.0 40.0 1 4 5.0 40.0 1 5 4.0 40.0 1

CASRE Failure Counts CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Time Between Failures CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Failure Intensity CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Cummulative Failures CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Test Interval Length CALOE+MAGTF Execution MAGTF Integration + Execution

Detecting Reliability Trends • Running Average: • Not as Useful for Failure Count Data (unless test intervals are equal length) • Computes the running average of the time between successive failures for time between failures data, or the running average of number of failures per interval for failure count data. • If the running average decreases with time (fewer failures per test interval), reliability growth is indicated. • Laplace Test: • Not as Useful for Failure Count Data (unless test intervals are equal length) • Occurrences of failures = homogeneous Poisson process • If the test statistic decreases with increasing failure#, then the null hypothesis can be rejected in favor of reliability growth at an appropriate significance level. Opposite for increases with increasing failure#

Running Average CALOE+MAGTF Execution MAGTF Integration + Execution

Laplace Test CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Cum Failure Predictions CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Prediction Setup CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Reliability Prediction CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Prequential Likelihood CALOE+MAGTF Execution MAGTF Integration + Execution

CASRE Model-Ranking CALOE+MAGTF Execution MAGTF Integration + Execution

Reliability Models • Haven’t been able to get these to run yet. • Instruction manual says many of the built-in models only work with Time Between Failures Data. • Doubt there would be much utility with Failure Count Data

Conclusion/Follow-Up • It actually would be QUITE easy to integrate Failure Count or Time Between Failures Output Auto-Generation into my environment • This would facilitate quick trend-analysis • Reliability trends and not the actual numbers is what is important

A Case Study In Reliability Analysis

A Case Study In Reliability Analysis

Presentation Transcript

Case Study Analysis

A Case Study in Database Reliability: Component Types, Usage Profiles, and Testing

Case Study Analysis

Case Analysis Study

Case Study Analysis

Case Study Analysis

u01d- Case Study Analysis

Winnebago Case Study Analysis

A case study in Tanzania

CASE STUDY ANALYSIS

Spatial Analysis – A Case Study

NYISO / PJM - Focused Study Reliability Analysis

Case Control Study : Analysis

3Com Case Study Analysis

Vibration Analysis – Case Study

Reliability Analysis of Switches and Crossings – A Case Study in Swedish Railway

A Case Study Analysis

HBR Case Analysis advisor makes a great case study

Case Study Analysis | HBR Case Solution

Case study Analysis

A case study in Tanzania

Reliability Analysis