A Scalable Approach to Architectural-Level Reliability Prediction

A Scalable Approach to Architectural-Level Reliability Prediction Leslie Cheung Joint work with Leana Golubchik and Nenad Medvidovic

Motivation • Many design decisions are made early in the software development process • These decisions affect software quality • Need to assess software quality early • If problems are discovered later (e.g., after implementation), they may be costly to address

Motivation • We focus on assessing software reliability using architectural models in this talk • Reliability: the fraction of time that the system operates correctly • Architectural models: describes system structure, behavior, and interactions

Case Study: MIDAS Measure room temperature and adjust the temperature according to a user-specified threshold by turning on/off the AC Sensor: measures temperature and sends the measured data to a Gateway Gateway: aggregates and translates the data and sends it to a Hub Hub: determines whether it should turn the AC on or off AC: Control the AC GUI: View current temperature, and change thresholds

Motivations • Existing approaches for concurrent systems: keeps track of the states of all components • MIDAS Example • State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)

Motivations (Taking Measurements, idle, idle, idle, Processing User Request, idle) • Existing approaches for concurrent systems: keeps track of the states of all components • MIDAS Example • State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)

Motivations (Failed!, idle, idle, idle, Processing User Request, idle) • Existing approaches for concurrent systems: keeps track of the states of all components • MIDAS Example • State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)

Motivations (Taking Measurements, idle, idle, idle, Processing User Request, idle) • Existing approaches for concurrent systems: keeps track of the states of all components • MIDAS Example • State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC) • Problem: Scalability • e.g., 2 Gateways,10 Sensors each • >5000 states • How about real-world applications, which may have 100s of Sensors and Gateways? •  The models are too big to solve

The SHARP Framework • SHARP: Scalable, Hierarchical, Architectural-Level Reliability Prediction Framework • Idea: generate part of the system model at a time by leveraging use-case scenarios • Solving many smaller models is more efficient than solving one huge model

MIDAS Use-Case Scenarios • MIDAS example • Sensor Measurement • GUI Request • Control AC

Modeling concurrency: instances of scenarios may run simultaneously MIDAS Example Processing a GUI request while processing sensor measurements  Sensor Measurement and GUI request scenarios run simultaneously Multiple sensors  Multiple instances of the Sensor Measurement scenario The SHARP Framework

The SHARP Framework • Generate and solve submodels according to the system’s use-case scenarios • Generate and solve a coarser-level model for system reliability • Describe what happens when multiple instances of scenarios are running • Make use of results from the submodels

The SHARP Framework

The SHARP Framework R1 m1

The SHARP Framework R2 m2 R3 m3

The SHARP Framework • Generate and solve submodels according to the system’s use-case scenarios • Generate and solve a coarser-level model for system reliability • Describe the number of active instances of each scenarios • Make use of results from the submodels

The SHARP Framework

The SHARP Framework m1 R1 m2 R2 m3 R3

The SHARP Framework m1 R1 m2 R2 R m3 R3

Evaluation • To show… • SHARP has better scalability than a flat model that can be derived from existing approaches, and • SHARP is accurate, using results from the flat model as “ground truth” • Experiments • Computational cost in practice • Sensitivity analysis

Computational cost in practice • Example: MIDAS system, varying the number of Sensor component (x-axis) • Y-axis: number of operations needed to solve the model

Sensitivity Analysis We are primarily interested in what-if analysis Is Architecture A “better” than Architecture B? but not Will my system’s reliability greater than 90%? What is the probability that I can run my system for 100 hours without any failure? Focusing on trendsis meaningful at the architectural level

Sensitivity Analysis “Ground truth”: results from the flat model Vary Sensor failure rate

Conclusions • Assessing software quality early is desirable • Scalability is a major challenge in reliability prediction of concurrent systems using architectural models • We tackle address this challenge by leveraging a system’s use-case scenarios in SHARP • Future Work: Contention modeling • Work thus far: assume no contention • However, concurrency  contention

The End

Defects • Architectural: Mismatches between architectural models • e.g., An interaction protocol mismatch between 2 comps • System: Limitations of components • e.g., Sensor has limited power • Allow system designers to evaluate how much reliability will improve if defects are addressed • Cost

A Scalable Approach to Architectural-Level Reliability Prediction

A Scalable Approach to Architectural-Level Reliability Prediction

Presentation Transcript

A prediction approach to representative sampling

A Multi-Parameter Approach to Lightning Prediction

Wafer Level Reliability

Methodology for Architectural Level Reliability Risk Analysis

Reliability Prediction

Architectural-Level Prediction of Interconnect Wirelength and Fanout

TLB Reliability: Architectural Analysis

A Scalable Machine Learning Approach to Go

A Multi-level Approach to Quantization

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,

A new approach to protein structure prediction

A Scalable Approach to Thread-Level Speculation

A Novel Approach to Event Duration Prediction

A Scalable, Non-blocking Approach to Transactional Memory

Lightcuts: A Scalable Approach to Illumination

MRNet: From Scalable Performance to Scalable Reliability

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,

Reliability Prediction

Architectural-Level Prediction of Interconnect Wirelength and Fanout

A Scalable Machine Learning Approach to Go

Lightcuts: A Scalable Approach to Illumination

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,