780 likes | 943 Views
Software Fault Tolerance (SWFT) SWIFI in OSs. Prof. Neeraj Suri Constantin Sârbu Dept. of Computer Science TU Darmstadt, Germany. Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de. So far: Verification & Validation Testing Techniques Static vs. Dynamic
E N D
Software Fault Tolerance (SWFT)SWIFI in OSs Prof. Neeraj Suri Constantin Sârbu Dept. of Computer Science TU Darmstadt, Germany Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
So far: Verification & Validation Testing Techniques Static vs. Dynamic Black-box vs. White-box Last time: Testing of dependable systems Modeling Fault-injection (FI / SWIFI) Some existing tools for fault injection Today: Testing (SWIFI) of operating systems WHERE: Error propagation in OSs [Johansson’05] WHAT: Error selection for testing [Johansson’07] WHEN: Injection trigger selection [Johansson’07] Next (last before mid-term exam!): Profiling the OS extensions (state change @ runtime) Fault Removal: Software Testing
Reminder: SWIFI • General SW • Manipulate bits in memory locations, registers, buses etc. • Emulation of HW faults • Change text segment of processes • Emulation of SW faults (bugs, defects) • Dynamic: E.g., Op-code switch during operation • Static: Change source code and recompile (a.k.a. mutation) • What is different in Oss? • OS act as a mediator between HW and user SW applications • Kernel mode – low accessibility • A failure of the OS often means failure of the whole system • Often source code not available • Add-on kernel extensions written by other parties than OS producer -> lack of experience • Etc.
OS Robustness Testing Efforts at DEEDS • Our research topics presented today: • Error propagation profiling • How errors propagate through OS to the user space • “Error Propagation Profiling of Operating Systems” (DSN’05) • Error selection • How an OS reacts to various types of injected errors • “On the Selection of Error Model(s) for OS Robustness Evaluation” (DSN’07) • Error trigger • How to choose the injection instant? • “On the Impact of Injection Triggers for OS Robustness Evaluation” (ISSRE’07) • Slides are the ones presented at each conference! http://www.deeds.informatik.tu-darmstadt.de/aja/
Error Propagation Profiling of Operating Systems Andréas Johansson & Neeraj Suri Department of Computer Science Technische Universität Darmstadt, Germany Presented at DSN 2005
Applications Operating System Libraries HW/Drivers Motivation Paper Objectives • Investigate Experimental Error Propagation Profiling of OS Interfaces/Svcs • Quantitative and Metrics! • Dynamism & Operational Profiles • Black Box with no internal access
A C Increasingly bad E B D F Profiling A C E B D F ! !
Profiling • Experimental technique to ascertain “vulnerabilities” • Identify (potential) sources, error propagation & hot spots, etc. • Estimate their “effects” on applications • Component enhancement with “wrappers” • if (X > 100 && Y < 30) then Exception(); • Location of wrappers • Aspects • Metrics for error propagation profiles • Experimental analysis
System Model ? Applications Operating System Drivers
Exported Imported dsx.1 … dsx.m osx.1 … osx.n Driver X Hardware Device Driver • Model the interfaces (defined in C) • Export (functions provided by the driver) • Import (functions used by the driver)
Error Model • Data level errors in OS-Driver interface • Wrong values • Based on the C-type • Boundary • Special values • Offsets • Transient • First occurrence
Metrics Three metrics for profiling • Propagation - how errors flow through the OS • Exposure - which OS services are affected • Diffusion - which drivers are the sources • Impact analysis • Metrics • Case study (WinCE) • Results
Service Error Permeability 1. Service Error Permeability: • Measure one driver’s influence on one OS service • Used to study service-driverrelations
OS Service Error Exposure 2. OS Service Error Exposure: • An application uses certain services • How are these services influenced by driver errors? • Used to compare services
Driver Error Diffusion 3. Driver Error Diffusion: • Which driver affects the system the most? • Used to compare drivers
Impact Analysis • Impact ascertained via failure mode analysis • Failure classes: • Class NF: No visible effect • Class 1: Error, no violation • Class 2: Error, violation • Class 3: OS Crash/Hang ?
Test App Case Study: Windows CE • Targeted drivers • Serial • Ethernet • FI at interface • Data level errors • Effects on OS services • 4 Test applications Manager Host OS Interceptor Drivers Drivers Drivers Target Driver
Error Model LONG RegQueryValueEx([in] HKEY hKey, [in] LPCWSTR lpValueName, [in] LPDWORD lpReserved, [out] LPDWORD lpType, [out] LPBYTE lpData, [in/out] LPDWORD lpcbData);
Service Error Permeability • Ethernet driver • 42 imported svcs • 12 exported svcs • Most Class 1 • 3 Crashes (Class 3)
OS Service Error Exposure • Serial driver • 50 imported svcs • 10 exported svcs • Clustering of failures
Higher diffusion for Ethernet Most Class NF Failures at boot-up Driver Error Diffusion
On the Selection of Error Model(s) for OS Robustness Evaluation Brendan MurphyMicrosoft Research, Cambridge, UK Presented at DSN 2007 Andréas Johansson, Neeraj SuriTU Darmstadt, Germany
Objectives: “What to Inject?” • FI’s effectiveness arises based on the chosen error model being (a) representative of actual errors, and (b) effectively triggering “vulnerabilities”. • Comparative evaluation of “effectiveness” of different error models: • Fewest injections? • Most failures? • Best “coverage”? • Propose a composite error model for enhancing FI effectiveness
Error Models Focus • Target errors arising in device drivers • Main source of OS failures [1, 2] • Developed by HW vendors • Continually evolving • Considered error models • Data-type • Bit-flips • Fuzzing [1] Ganapathi et. al., LISA’06 [2] Chou et. al., SOSP’01
System Model Applications OS-App services Operating System OS-Driver services Drivers
Injection Methodology Operating System OS reconfigured to use Interceptor Intercepts function calls between OS and driver Interceptor Device Driver Driver binary modified to use Interceptor Implemented for Windows CE .Net
Chosen Drivers & Error Models Error Models: • Data-type (DT) • Bit-flips (BF) • Fuzzing (FZ)
Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x80000000
Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x80000000, 0x00000000); • Varied #cases depending on the data type • Requires tracking of the types for correct injection • Complex implementation but scales well
Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001
Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001 1000101101000101000100111110001
Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a289f1, 0x00000000); 1000101101000101000100111110001 • Typically 32 cases per parameter • Easy to implement
Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);
Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x17af34c2
Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x17af34c2, 0x00000000); • Selective #cases • Simple implementation
Comparison Compare Error Models on: • Number of failures • Effectiveness • Experimentation Time • Identifying services • Error propagation
Failure Classes & Driver Diffusion Driver Diffusion [3]: a measure of a driver’s ability to spread errors: [3] Johansson, Suri, DSN’05
Failure Classes & Driver Diffusion Driver Diffusion (Class 3)
Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3)
Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3 + 2)
Bit-Flips: Sensitivity to Bit Position? [MSB] [LSB]
Bit-Flips: Bit Position Profile Cumulative #services identified
Composite Error Model • Let’s take the best of bit-flips and fuzzing • Bit-flips: bit 0-9 and 31 • Fuzzing: 10 cases • ~50% fewer injections • Identifies the same service set