1 / 78

Software Fault Tolerance (SWFT) SWIFI in OSs

Software Fault Tolerance (SWFT) SWIFI in OSs. Prof. Neeraj Suri Constantin Sârbu Dept. of Computer Science TU Darmstadt, Germany. Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de. So far: Verification & Validation Testing Techniques Static vs. Dynamic

taini
Download Presentation

Software Fault Tolerance (SWFT) SWIFI in OSs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Fault Tolerance (SWFT)SWIFI in OSs Prof. Neeraj Suri Constantin Sârbu Dept. of Computer Science TU Darmstadt, Germany Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

  2. So far: Verification & Validation Testing Techniques Static vs. Dynamic Black-box vs. White-box Last time: Testing of dependable systems Modeling Fault-injection (FI / SWIFI) Some existing tools for fault injection Today: Testing (SWIFI) of operating systems WHERE: Error propagation in OSs [Johansson’05] WHAT: Error selection for testing [Johansson’07] WHEN: Injection trigger selection [Johansson’07] Next lecture: Profiling the OS extensions (state change @ runtime) Fault Removal: Software Testing

  3. Reminder: SWIFI • General SW • Manipulate bits in memory locations, registers, buses etc. • Emulation of HW faults • Change text segment of processes • Emulation of SW faults (bugs, defects) • Dynamic: E.g., Op-code switch during operation • Static: Change source code and recompile (a.k.a. mutation) • What is different in Oss? • OS act as a mediator between HW and user SW applications • Kernel mode – low accessibility • A failure of the OS often means failure of the whole system • Often source code not available • Add-on kernel extensions written by other parties than OS producer -> lack of experience • Etc.

  4. OS Robustness Testing Efforts at DEEDS • Our research topics presented today: • Error propagation profiling • How errors propagate through OS to the user space • “Error Propagation Profiling of Operating Systems” (DSN’05) • Error selection • How an OS reacts to various types of injected errors • “On the Selection of Error Model(s) for OS Robustness Evaluation” (DSN’07) • Error trigger • How to choose the injection instant? • “On the Impact of Injection Triggers for OS Robustness Evaluation” (ISSRE’07) • Slides are the ones presented at each conference! • PhD thesis: http://tuprints.ulb.tu-darmstadt.de/epda/000943 • A. Johansson webpage: http://www.deeds.informatik.tu-darmstadt.de/aja

  5. Error Propagation Profiling of Operating Systems Andréas Johansson & Neeraj Suri Department of Computer Science Technische Universität Darmstadt, Germany Presented at DSN 2005

  6. Applications Operating System Libraries HW/Drivers Motivation Paper Objectives • Investigate Experimental Error Propagation Profiling of OS Interfaces/Svcs • Quantitative and Metrics! • Dynamism & Operational Profiles • Black Box with no internal access

  7. A C Increasingly bad E B D F Profiling A C E B D F ! !

  8. Profiling • Experimental technique to ascertain “vulnerabilities” • Identify (potential) sources, error propagation & hot spots, etc. • Estimate their “effects” on applications • Component enhancement with “wrappers” • if (X > 100 && Y < 30) then Exception(); • Location of wrappers • Aspects • Metrics for error propagation profiles • Experimental analysis

  9. System Model ? Applications Operating System Drivers

  10. Exported Imported dsx.1 … dsx.m osx.1 … osx.n Driver X Hardware Device Driver • Model the interfaces (defined in C) • Export (functions provided by the driver) • Import (functions used by the driver)

  11. Error Model • Data level errors in OS-Driver interface • Wrong values • Based on the C-type • Boundary • Special values • Offsets • Transient • First occurrence

  12. Metrics Three metrics for profiling • Propagation - how errors flow through the OS • Exposure - which OS services are affected • Diffusion - which drivers are the sources • Impact analysis • Metrics • Case study (WinCE) • Results

  13. Service Error Permeability 1. Service Error Permeability: • Measure one driver’s influence on one OS service • Used to study service-driverrelations

  14. OS Service Error Exposure 2. OS Service Error Exposure: • An application uses certain services • How are these services influenced by driver errors? • Used to compare services

  15. Driver Error Diffusion 3. Driver Error Diffusion: • Which driver affects the system the most? • Used to compare drivers

  16. Impact Analysis • Impact ascertained via failure mode analysis • Failure classes: • Class NF: No visible effect • Class 1: Error, no violation • Class 2: Error, violation • Class 3: OS Crash/Hang ?

  17. Test App Case Study: Windows CE • Targeted drivers • Serial • Ethernet • FI at interface • Data level errors • Effects on OS services • 4 Test applications Manager Host OS Interceptor Drivers Drivers Drivers Target Driver

  18. Error Model LONG RegQueryValueEx([in] HKEY hKey, [in] LPCWSTR lpValueName, [in] LPDWORD lpReserved, [out] LPDWORD lpType, [out] LPBYTE lpData, [in/out] LPDWORD lpcbData);

  19. Service Error Permeability • Ethernet driver • 42 imported svcs • 12 exported svcs • Most Class 1 • 3 Crashes (Class 3)

  20. OS Service Error Exposure • Serial driver • 50 imported svcs • 10 exported svcs • Clustering of failures

  21. Higher diffusion for Ethernet Most Class NF Failures at boot-up Driver Error Diffusion

  22. On the Selection of Error Model(s) for OS Robustness Evaluation Brendan MurphyMicrosoft Research, Cambridge, UK Presented at DSN 2007 Andréas Johansson, Neeraj SuriTU Darmstadt, Germany

  23. Objectives: “What to Inject?” • FI’s effectiveness arises based on the chosen error model being (a) representative of actual errors, and (b) effectively triggering “vulnerabilities”. • Comparative evaluation of “effectiveness” of different error models: • Fewest injections? • Most failures? • Best “coverage”? • Propose a composite error model for enhancing FI effectiveness

  24. Error Models Focus • Target errors arising in device drivers • Main source of OS failures [1, 2] • Developed by HW vendors • Continually evolving • Considered error models • Data-type • Bit-flips • Fuzzing [1] Ganapathi et. al., LISA’06 [2] Chou et. al., SOSP’01

  25. System Model Applications OS-App services Operating System OS-Driver services Drivers

  26. Injection Methodology Operating System OS reconfigured to use Interceptor Intercepts function calls between OS and driver Interceptor Device Driver Driver binary modified to use Interceptor Implemented for Windows CE .Net

  27. Chosen Drivers & Error Models Error Models: • Data-type (DT) • Bit-flips (BF) • Fuzzing (FZ)

  28. Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

  29. Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x80000000

  30. Error Models – Data-Type (DT) Errors int foo(int a, int b) {…} int ret = foo(0x80000000, 0x00000000); • Varied #cases depending on the data type • Requires tracking of the types for correct injection • Complex implementation but scales well

  31. Error Models – Data-Type (DT) Errors

  32. Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

  33. Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001

  34. Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 1000101101000100000100111110001 1000101101000101000100111110001

  35. Error Models – Bit-Flip (BF) Errors int foo(int a, int b) {…} int ret = foo(0x45a289f1, 0x00000000); 1000101101000101000100111110001 • Typically 32 cases per parameter • Easy to implement

  36. Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000);

  37. Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x45a209f1, 0x00000000); 0x17af34c2

  38. Error Models – Fuzzing (FZ) Errors int foo(int a, int b) {…} int ret = foo(0x17af34c2, 0x00000000); • Selective #cases • Simple implementation

  39. Comparison Compare Error Models on: • Number of failures • Effectiveness • Experimentation Time • Identifying services • Error propagation

  40. Failure Classes & Driver Diffusion

  41. Failure Classes & Driver Diffusion Driver Diffusion [3]: a measure of a driver’s ability to spread errors: [3] Johansson, Suri, DSN’05

  42. Number of Failures (Class 3)

  43. Failure Classes & Driver Diffusion Driver Diffusion (Class 3)

  44. Experimentation Time

  45. Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3)

  46. Which OS services can cause Class 3 failures? Which error model identifies most services (coverage)? Is some model consistently better/worse? Can we combine models? Identifying Services (Class 3 + 2)

  47. Bit-Flips: Sensitivity to Bit Position? [MSB] [LSB]

  48. Bit-Flips: Bit Position Profile Cumulative #services identified

  49. Fuzzing – Number of injections?

  50. Composite Error Model • Let’s take the best of bit-flips and fuzzing • Bit-flips: bit 0-9 and 31 • Fuzzing: 10 cases • ~50% fewer injections • Identifies the same service set

More Related