OS II: Dependability & Trust SWIFI-based OS Evaluations

OS II: Dependability & TrustSWIFI-based OS Evaluations Prof. Neeraj Suri Stefan Winter Dept. of Computer Science TU Darmstadt, Germany Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

So far: Verification & Validation Testing Techniques Static vs. Dynamic Black-box vs. White-box Last time: Fault Injection (FI) Applications Techniques Some FI tools Today: Testing (SWIFI) of operating systems WHERE: Error propagation in OSs [Johansson’05] WHAT: Error selection for testing [Johansson’07] WHEN: Injection trigger selection [Johansson’07] Next lecture: Profiling the OS extensions (state change @ runtime) Fault Detection: Software Testing

FI Recap Fault Injection (FI) is the process of either inserting bugs into your system or exposing your system to operational perturbations • FI applications for dependable system development • Defect Count Estimation (Fault Seeding) • Test Suite Evaluation (Mutation Testing) • Security Testing • Experimental Dependability Evaluations • FI techniques • Physical FI • HW FI • Simulated FI • SWIFI

FI Recap (cont.) • Where to apply change (location, abstraction/system level) • What to inject (what should be injected/corrupted?) • Which trigger to use (event, instruction, timeout, exception, … ?) • When to inject (on first/second/… trigger event) • How often to inject (Heisen-/Bohrbugs) • … • What to record & interpret? For what purpose? • How is the system loaded at the time of the injection • Applications running and their load (workload) • System resources • Real  realistic  synthetic workload

Outline for today‘s lecture • Drivers - a major dependability issue in commodity OSs • An error propagation view • FI-based robustness evaluations of the kernel • Black box assumption • Fault representativeness vs. failure relevance • Design and implementation issues of a suitable FI framework • Fault modeling • Failure modeling • Workloads

The problem: Drivers! • Device drivers • Numerous: 250 installed (100 active) drivers in XP/Vista • Large & complex:70% of Linux code base • Immature: every day 25 new / 100 revised versions Vista drivers • Access Rights: kernel mode operation in monolithic OSs • Device drivers are thedominant causeof OS failuresdespite sustained testing efforts Causes of WinXP outages Causes of Win2k outages

The problem (cont.) • Problem statement:Driver failures lead to OS API failures • Mitigation approaches • Harden OS robustness • Improve driver reliability

The problem (cont.) The problem in terms of error propagation The effect of robustness hardening in terms of error propagation The effect of testing in terms of error propagation

Issues with the driver testing approach What if the driver is not the root cause? What if we cannot remove defects (e.g. commercial OSs)?

Issues with the hardening approach What if we cannot remove robustness vulnerabilities? More issues with the hardening approach in next week‘s lecture...

FI-based robustness evaluations • Fault containment wrappers are expensive • Additional code is an additional source of bugs • Runtime overhead for error checks • Where should we add fault containment wrappers? • Where errors with critical effects are likely to occur • Where propagation is likely • Where critical errors propagate • How do we know where which errors propagate? • Propagation analysis (cf. PROPANE)

A C Increasingly bad E B D F Robustness Evaluations A C E B D F ! !

Robustness Evaluations • Experimental technique to ascertain “vulnerabilities” • Identify (potential) sources, error propagation & hot spots, etc. • Estimate their “effects” on applications • Component enhancement with “wrappers” • if (X > 100 && Y < 30) then Exception(); • Location of wrappers • Aspects • Metrics for error propagation profiles • Experimental analysis

System Model ? Applications Operating System Drivers

Exported Imported dsx.1 … dsx.m osx.1 … osx.n Driver X Hardware Device Driver • Model the interfaces (defined in C) • Export (functions provided by the driver) • Import (functions used by the driver)

Metrics Three metrics for profiling • Propagation - how errors flow through the OS • Exposure - which OS services are affected • Diffusion - which drivers are the sources • Impact analysis • Metrics • Case study (WinCE) • Results

Service Error Permeability 1. Service Error Permeability: • Measure one driver’s influence on one OS service • Used to study service-driverrelations

OS Service Error Exposure 2. OS Service Error Exposure: • An application uses certain services • How are these services influenced by driver errors? • Used to compare services

Driver Error Diffusion 3. Driver Error Diffusion: • Which driver affects the system the most? • Used to compare drivers

Test App Case Study: Windows CE • Targeted drivers • Serial • Ethernet • FI at interface • Data level errors • Effects on OS services • 4 Test applications Manager Host OS Interceptor Drivers Drivers Drivers Target Driver

Error Model • Data level errors in OS-Driver interface • Wrong values • Based on the C-type • Boundary • Special values • Offsets • Transient • First occurrence

Impact Analysis • Impact ascertained via failure mode analysis • Failure classes: • Class NF: No visible effect • Class 1: Error, no violation • Class 2: Error, violation • Class 3: OS Crash/Hang ?

Error Model LONG RegQueryValueEx([in] HKEY hKey, [in] LPCWSTR lpValueName, [in] LPDWORD lpReserved, [out] LPDWORD lpType, [out] LPBYTE lpData, [in/out] LPDWORD lpcbData);

Service Error Permeability • Ethernet driver • 42 imported svcs • 12 exported svcs • Most Class 1 • 3 Crashes (Class 3)

OS Service Error Exposure • Serial driver • 50 imported svcs • 10 exported svcs • Clustering of failures

Higher diffusion for Ethernet Most Class NF Failures at boot-up Driver Error Diffusion

Error Models: “What to Inject?” • FI’s effectiveness arises based on the chosen error model being (a) representative of actual errors, and (b) effectively triggering “vulnerabilities”. • Comparative evaluation of “effectiveness” of different error models: • Fewest injections? • Most failures? • Best “coverage”? • Propose a composite error model for enhancing FI effectiveness

Chosen Drivers & Error Models Error Models: • Data-type (DT) • Bit-flips (BF) • Fuzzing (FZ)

Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x80000000

Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x80000000, 0x00000000); • Varied #cases depending on the data type • Requires tracking of the types for correct injection • Complex implementation but scales well

Error Models – Data-Type (DT) Errors

Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001

Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001 1000101101000101000100111110001

Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a289f1, 0x00000000); 1000101101000101000100111110001 • Typically 32 cases per parameter • Easy to implement

Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x17af34c2

Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x17af34c2, 0x00000000); • Selective #cases • Simple implementation

Comparison Compare Error Models on: • Number of failures • Effectiveness • Experimentation Time • Identifying services • Error propagation

Failure Classes & Driver Diffusion

Failure Classes & Driver Diffusion Driver Diffusion: a measure of a driver’s ability to spread errors:

Number of Failures (Class 3)

Failure Classes & Driver Diffusion Driver Diffusion (Class 3)

Experimentation Time

Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3)

Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3 + 2)

Bit-Flips: Sensitivity to Bit Position? [MSB] [LSB]

Bit-Flips: Bit Position Profile Cumulative #services identified

Fuzzing – Number of injections?

OS II: Dependability & Trust SWIFI-based OS Evaluations