230 likes | 375 Views
Language-Based Safety Mechanisms. Stanford University CS 444A, Autumn 99 Software Development for Critical Applications Armando Fox & David Dill {fox,dill}@cs.stanford.edu. Concepts Overview & Outline. Static approaches “Safe by design” (limiting the language)
E N D
Language-Based Safety Mechanisms Stanford University CS 444A, Autumn 99Software Development for Critical ApplicationsArmando Fox & David Dill{fox,dill}@cs.stanford.edu
Concepts Overview & Outline • Static approaches • “Safe by design” (limiting the language) • Static analysis/type-safe languages • Dynamic approaches • interpreters and sandboxes • Dynamic dataflow analysis • A few examples (and problems) • Java, the Exokernel, VMware, SFI, Janus, Interface Compilation • As usual…each bullet is the subject of volumes of papers…this is just an introduction to the landscape
Contrast With David’s “Req Spec” • RS is about verifying a program (or FSM) in the abstract • SFI is about securing them in practice • The two are complementary • Ex: “Transitions in FSM cover all possibilities” • What is “all”, really? • Recall: dreaming up desired emergent properties • Compare: Intel P6 bus protocol verification vs. implementation validation
What Is “Safety” in this context? • Primary emphasis: prevent buggy/malicious app from doing harm to others • Don’t interfere with other apps directly (read/write their data or files) • Don’t interfere with other apps indirectly (hog OS resources so other apps are denied service) • Don’t crash or corrupt the OS • particularly important, since OS usually is the “trusted arbiter” of limited resources • Non-goal: stability of the isolated app.
Techniques • Two basic families of techniques: 1. Limit things at runtime 2. Limit things at compile time • Many schemes use a combination of both • Runtime schemes typically rely on some OS and/or hardware support
Background: “The Thin Red Line” Userpage Userpage Userpage Userpage Userpage Userpage Kernel pages • Separates untrusted user space(s) from trusted kernel space • Kernel manages hardware, shared resources, … • If you can bend the kernel to your will, you can do serious damage • Typical implementation: hardware VM support • Each user process has its own page tables (managed by the kernel) • Certain addresses mapped to kernel pages Usercode Programming model Kernelcode
Call Gates • Call gates (or call descriptors, or traps, or…) • Controlled breach in the thin red line • Typically involve an address space change, which relies on VM; so they are slow and expensive • Implementation often uses exception-handling capability of processor User code Kernel code
Background: Virtual Machines • In practice, a VM provides a combination of a language execution environment and a “pseudo-OS” runtime system • “guest” VM may virtualize hardware resources differently from “host” OS • Safety is often not a primary goal of a VM • The “guest” and “host” OS’s may be the same or different with respect to… • Machine language/programmer-visible architecture • Virtualization of resources • Common flavor to various approaches: Control access to “unsafe” language/VM features
VM Examples • Java: artificial-machine-in-a-real-machine • Provides a language, a runtime, and OS-like abstractions (network, filesystems, etc.) • Centralized Java Security Manager enforces security policies • For the most part, runs in user mode • VMware: virtualize any x86 OS inside any other (well, almost) • Every VM “sees” x86 protected-mode environment • Within a VM, policies enforced by guest OS • Across VM’s, virtualized hardware is isolated • User must grant a certain level of trust to VMware host program
What Can You Do With This? • Limit what the language can express • “Unsafe” operations are defined out of existence • “Never put off till runtime what you can do at compile time” • Limit what can be done at runtime • Perhaps in combination with language limiting • Each approach has pros and cons
Static Analysis, Type-Safe Languages • Goal: To limit the damage a program can do, limit what can be expressed in the source language • Assumes binaries are tamper-evident • Assumes only trusted tools used to build binaries • Assumes trusted tools are working correctly! • Language features/limitations may allow you to prove some invariants • Example: Backward branching disallowed finite-length programs finish in finite time • Example: Pointers disallowed dangling pointer dereferences vanish • Contrast: SFI or inserting guard code
Example: Spin and Modula-3 • SPIN (Bershad et al., early 90’s): a user-extensible microkernel • Extension language: Modula-3, a type-safe, object-oriented language • Why type safety? • Why object oriented? • The extension checker and compiler
Limiting the Language • Goal: To limit the damage a program can do, limit what can be expressed in the source language • Assumes binaries are tamper-evident • Assumes only trusted tools used to build binaries • Assumes trusted tools are working correctly! • Language features/limitations may allow you to prove some invariants • Example: Backward branching disallowed finite-length programs finish in finite time • Example: Pointers disallowed dangling pointer dereferences vanish • “Never put off till runtime what you can do at compile time”
Pros & Cons of Static Analysis - Requires that code be written in that specific language • Sometimes it’s actually desirable to have a simpler language! (e.g. Exokernel generalized packet filter) • Other times languages may be too limited or awkward • May also rely on integrity of tool chain - Languages with rich type systems and class hierarchies confound this approach • Checking virtual function calls • Casting between “safe” types (e.g. int to enum)
Static analysis, cont’d. - Relies on integrity of interpreter or binaries • What if the Java guys forgot some of the security checks? • VM interpreter may need semi-privileged access to get at the “real” resources controlled by the host OS • Or at least, OS must verify signed code segments (ActiveX does this) + May allow strong formal proofs of program safety • Usually done by showing that a particular high-level construct can never produce “unsafe” low-level code • Can prove from the source code, if transformations are “correctness-preserving” (or “semantics preserving”)
At Runtime: Classic SFI and Janus • SFI: “If program stays in its sandbox, it can’t damage other programs.” • Dangerous operations/references surrounded by interpolated “guard code” • Dangerous references can also be “pinned” to sandbox by overwriting upper address bits • Note, this breaks program correctness! But focus of SFI is preventing harm to others, not to oneself • Janus: “If program can’t make system calls, it can’t damage the OS [and therefore other programs]. • Some programs break because they don’t check system call results
Pros & cons of runtime approaches + Use high-confidence machine-level mechanisms • Based on hardware-level mechanisms, e.g. VM, traps • In practice, hardware implementation errors for these are extremely rare (why?) + Can be used with arbitrary “legacy” code - No onus on programmer to make potential error conditions explicit (e.g. assertions) • So runtime has no idea what to do to “recover” - Doesn’t guarantee correct behavior--only safety to others
Dynamic Dataflow Analysis • Potentially unsafe operations must always be denied, to be conservative • If done statically, renders code impotent • Idea: quarantine the data that may be “contaminated” by user (taintperl works this way) print STDERR “Enter file name:”;$x=<STDIN>; # $x is tainted (user input)…more code… $z=“/tmp/safe_file.txt”; # $z is clean$y=“$sysdir/$x”; # $y is taintedsystem(“cat $y”); # disallowed!system(“cat $z”); # OK
Interface Compilation Problem: interfaces are a syntactic abstraction that usually carry no semantics • Semantics might be useful for… • Special-case optimizations (e.g. file I/O, specialization by call site) • Safety of called proc, or error handling in case of failure • Is the interface too narrow? • Semantic type info may be lost (Unix) • Semantic properties such as “liveness” are not preserved across the interface (hidden state) - example to follow
Exploiting Semantics • Example 1: File I/O fd = open(filename);/* …do some file operations … */close(fd);/* …more code… */read(fd,buf,4096); /* certain to fail! */ • Example 2: type impoverishment read(int fd, void *buf, size_t n); • What if buf is unaligned or not big enough? • No way to tell from call syntax
Interface Compilation With MAGIK • Provides abstractions for dealing with interfaces • Iterators over the function calls • Accessors for the data structures manipulated by each call: what type? Compile-time constant? Access to internal fields of structure? Etc. • Allows programmer to write C-like code “extensions” using these functions and accessors • Original source and extensions are compiled together into common intermediate form • Intermediate form can be optimized using traditional methods before machine targeting
IC as an Orthogonal Mechanism • Can retrofit existing “legacy” code (provided source is available) • Admits of incremental improvements • Safety concerns/development can be kept separate from mainline logic for maintainability • Some cool implemented examples • Type-aware I/O for C • Safe signal handling (prevents calling non-reentrant library functions inside a signal handler) • Common thread: uses semantic information that cannot be extracted from source alone • Compare with “emergent properties” in req. spec.
Lessons? Anyone? • Limits of virtual machines and static analysis • Assumes tools are trustworthy, from a security standpoint • But…buggy == untrustworthy • End-to-end argument suggests falling back on runtime SFI?