"Efficient Software-based Fault Isolation"

"Efficient Software-based Fault Isolation" by R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Presenter: Tom Burkleaux

What is the problem? • With closely cooperating software modules, how do we protect from distrusted code? • What is distrusted code? • Code that might corrupt memory of other cooperating modules • Code that is not adequately tested • Perhaps written by third-party persons

Cooperating Modules • Complex systems are often broken down into separate (logical) components • This supports the following goals: • Independent development of individual components • Upgrade and enhance existing components • “Extensibility” – allow other programmers to add functionality

Examples: Cooperating Modules • Micro-kernel operation systems Elements of the OS are moved to user- space. • Postgres database manager Extensible Type System • Microsofts Object Linking & Embedding (OLE) Extensibility code supported by OS Link together independently developed software modules

Structuring Cooperating Modules, 1 • How can we structure cooperating modules? • There are two basic choices, each with a distinct advantage: • Share the same address space Low communication cost between modules • Keep modules in separate address spaces Separate protection domains

Structuring Cooperating Modules, 2 • … and each has a disadvantage: • Share the same address space Distrusted code can cause hard to find bugs within a system • Keep modules in separate address spaces Cross-Protection Domain calls are expensive (RPC). Overall application performance suffers.

Structuring Cooperating Modules, 3 Shared Memory RPC Protection Domains Shared Address

Structuring Cooperating Modules, 4 Shared Memory Yes ? RPC Distrusted Code Slow! Protection Domains Shared Address

Trends in structuring? • In OS, which method of structuring coop. modules is more prevalent? • BSD System • Mach • Mac OS X • Linux (I think?) more a tendency to share address space for performance reasons

Proposed Solution If we want fault isolation, the authors offer a tradeoff: Fault Isolation with – “substantially faster communication between fault domains, at a cost of slightly increased execution time for distrusted modules”

What Does Solution Look Like? • We want the speed of shared memory • But we want to prevent distrusted code from corrupting memory of other modules • “Fault Domains” Shared Memory NO Yes Distrusted Code

Overview of Techniques • The authors suggest two techniques for software based fault-isolation: • Segment Matching • Sandboxing • And two times these can be applied: • Compile Time • Object Linking

“Fault Domain” -- Definition 990--- 991--- 991--- 992--- 992--- 993--- … … … Each segment shares unique pattern of upper bits. “segment identifier” Segment (code) Segment (data) Fault Domain

Examining Binary Code • For distrusted modules: • What about modifying the binary so we add a check on all loads and stores? “binary patching” • Assume this could be done at load-time • Addresses are used very frequently, and this method would add extra instructions for each address reference • Many tools are based on identifying compiler-specific idioms to distinguish between code and data

Software-Enforced Fault Isolation: Segment Matching • “unsafe instruction” – an instruction that jumps or stores an address, and can’t be statically verified • Jumps through register are example • Compiler can add code to check instructions • On typical RISC architectures, this takes 4 instructions • Requires dedicated registers, to prevent checks being by-passed

Segment Matching, 2 Pseudo code example. w/o dedicated registers, code could jump to last instruction

Segment Matching, 3 With segment-matching, we can pin-point the source of the fault How does this compare with hardware-based memory protection? Shared Memory Yes Distrusted Code Trap

Segment Matching, 4 What about the loss of registers? • We need four: addresses in data segment, address in code segment, segment shift amount, and segment identifier • Author rely on most modern architectures having at least 32 registers

Software-Enforced Fault Isolation: Address Sandboxing Idea is we can reduce run-time overhead by giving up information on source of fault. Sandboxing: Before each unsafe instruction we simply insert code that sets the uppers bit of the target to the correct segment identifier

Sandboxing, 2 • There is no trap • Only two extra instructions • Recall that segment matching has five extra instructions

Sandboxing, 3 Any address access outside of the module’s segment is prevented We access and potentially corrupt an incorrect address within our own segment Execution continues, unaware of error! Shared Memory ? Yes Distrusted Code The sandbox

Optimizations • Guard Zones are one example of an opt. that can be handled by a compiler. • Avoid address arithmetic • Reg + Offset instr. • We sandbox only the Reg and handle offset by creating guard zones

Process Resources • We need to prevent distrusted modules from corrupting resources allocated on a per-address-space basis. • One idea: make OS aware of fault domains • Authors choose to require distrusted modules to access resources through cross-fault-domain RPC

Implementation • Authors identify two strategies for implementation • 1. Have a compiler emit encapsulation code and have a verifier confirm object code at load time. • 2. Modifying object code at load time. “binary patching” They went with option 1. Problem with modifying object code is making modified object code use a subset of registers.

Low Latency Cross Fault RPC • Because distrusted modules are isolated, we need something like LRPC. • Efficient software isolation was first part of solution • The second part of solution is fast communication across fault domains

Cross Fault RPC, 2 • Constraints of running distrusted code: • Distrusted code can’t directly call a function outside its segment, or return from an outside call via an address on the stack • When code is running within a distrusted module it has its own execution context.

Cross Fault RPC, 3 • Like LRPC, the solution is done through stubs • For each pair of fault domains a customized call and return stubs is created for each exported procedure • Stub run unprotected outside both domain • They are responsible for copy of args • And managing machine state

Cross Fault RPC, 4 Args are copied through a shared buffer Jump Table: each entry is a control transfer instruction to legal entry point outside the domain

Cross Fault RPC, 5 The stubs and the jump table are added to the code. The jump table added to the code segment, so distrusted code can’t modify.

Evaluation • For evaluation, the authors looked at three questions: • What is overhead for software encapsulation? • How fast is XFD-RPC? • What effect does this solution have on end-user applications?

Evaluation – Software Encapsulation, 1 The authors developed an analytical model to predict cost. Expected overhead is: (s-instructions – interlocks ) / cycles-per-second ------------------------------------------------------ Original-execution-time-seconds S-instructions = sandbox instructions Interlocks = saved floating point interlocks

Evaluation - Software Encapsulation, 2 The purpose of the model is help identify 2nd order effects. To get data, they ran various benchmarks, and ran the benchmark code as “untrusted” modules. They found that their model predicted the “average” overhead very well. But individual benchmark test showed NEGATIVE overhead!

Evaluation-Software Encapsulation

Evaluation - Software Encapsulation, 4 • Explaining results • Anomolies – 2nd order effects? Conjecture: “instruction cache mapping conflicts” • Programs with more floating-point operations exhibitied less overhead (2.5% vs 5.6%) • These were compute heavy benchmarks • They expect I/O programs to have less overhead Overall, overhead is not much!

Evaluation – Cross-Fault RPC, 1 • How to measure XFD-RPC? Their mechanism spends most of its time saving and restoring registers. • Table 2 (following) – shows performance for NULL cross fault domain RPC. And this is compared to a C procedure call and Pipes. • Their call is 1 order of magnitude better than C procedure call • Other optimized RPCs are, at best, 2 orders of magnitude better than C procedure call.

Evaluation – Cross-Fault RPC, 2

Evaluation – Cross-Fault RPC, 3 Table 3 measures how their system works when applied to Postgres, using the Sequoia 2000 benchmark. Postgres has extensible type system, which is a recognize safety problem. They want to compare their system with Postgres’ built in “untrusted function manager” and traditional hardware protection domains.

Analysis • The author’s Postgress example shows savings over other methods. • The formula for savings in general is: • Savings = (1 – r) tc - h td • tc time spent crossing fault domains • td time spent in distrusted code • h overhead for encapsulation • r ratio of their crossing time to hardware RPC

Analysis, 2 • The savings formula can be graphed to illustrate the breakeven curve. • In figures following: X-axis: percentage of time spent crossing domains Y-axis: relative cost of software enforced fault-domain crossing vs hardware method

Breakeven curve

Analysis, 4 • The question is: does savings in efficient XFD-RPC (over traditional RPC) make up for encapsulation overhead? • Answer appears to yes • Author’s give example if app spends 30% of its time crossing fault domains, their RPC mechanism needs to be only 10% better • Figure 5 was conservative and assumed everything was protected. Usually most of the app is trusted. Figure 6 assume only 50% of time is spend in distrusted code.

Breakeven curve

Some Additional Points Why not object code? Because tools are there and you may lose compiler efficiencies How many tools are written for this? In what compiler languages? Any Questions?

"Efficient Software-based Fault Isolation"