Untrustworthy Programming Languages

Untrustworthy Programming Languages Andrew Kennedy, MSR Cambridge

Do you trust your programming language? • Modern programming platforms promise security: • The Java security model is based on a customizable "sandbox" in which Java software programs can run safely, without potential risk to systems or users (java.sun.com/security) • The .NET Common Language Runtime implements its own secure execution model that is independent of the host platform (Don Box, MSDN magazine) • Most articles emphasise type-safety (=> memory safety) of the JVM or CLR • And of course, special-purpose mechanisms such as Code Access Security (stack-walking), permissions, crypto, etc • But that’s not the whole story…

The way it was • In the past: • programming language abstractions made languages “high-level” i.e. far from the raw metal of the machine • good software engineering • protected programmers from themselves & others • If the language contained holes, it was “just” a programming problem • In any case, nothing was enforced “underneath” except at coarse boundaries (machine, system/user, process)

But now... • The programming model is part of the security model • in particular, its type system • but also, other aspects… • Programmers will assume that abstractions are enforced underneath... • ...and use them to write secure code.

Eiffel, 1989 Cook, W.R. (1989) - A Proposal for Making Eiffel Type-Safe, in Proceedings of ECOOP'89. S. Cook (ed.), pp. 57-70. Cambridge University Press. Betrand Meyer, on unsoundness of Eiffel: “Eiffel users universally report that they almost never run into such problems in real software development.”

Ten years later: Java

Secure programming platforms Java source C# C++ Visual Basic C++ compiler VB compiler Java compiler C# compiler JVML (bytecodes) CIL CIL CIL Executed on Executed on JVM(Java Virtual Machine) .NET CLR(Common Language Runtime)

Type safety • Ensures • data safety: can access memory only through typed objects • code safety: can access components only according to their interface • Isolates software processes (“Application Domains” in .NET) • used for downloadable plug-ins for UI in next version of Windows • Importance of type safety is now widely appreciated • Microsoft would issue an immediate “critical update” if a type safety bug was discovered [Insert war stories here]

Type loophole => anything goes • Exploit a type loophole to execute arbitrary code. Here’s a recipe. • Define a delegate type D, create a delegate object off an empty methoddelegate void D();public static void DoNothing() { }D d = new D(DoNothing); • Define a SpoofD class with int field spoofing the (internal) function pointer field of the delegate typeclass SpoofD { public int fptr; ... } • Now pretend that the delegate object has type SpoofD (via type loophole)SpoofD sd = ...loophole magic...(d); • Set the spoof function pointer field to the address of your malicious codesd.fptr = my_bad_code; • Invoke the delegatesd();

Beyond type safety • How do programmers reason about security properties of their code? Or about their code at all? We might hope that: • A C# programmer can reason about code armed only with the C# language spec and specs for libraries used by the code • Unfortunately, it seems that a C# programmer also needs • Some understanding of how C# is translated into IL • Some understanding of the behaviour of IL • Some understanding of parts of the standard library not mentioned in the language spec or used by the program

Example 1: “Privacy through override” • In C# (and Java), overridden methods cannot be invoked directly except by the overriding method • This property has been used by programmers for security purposes:class InsecureWidget { // No checking of argument virtual void Put(string s); …}class SecureWidget : InsecureWidget { // Validate argument and pass on override void Put(string s) { Validate(s); base.Put(s); }}…SecureWidget sw = new SecureWidget();// We can’t avoid validation of arguments to Put, can we? // Oh, yes we can! Direct call on superclassldloc swldstr “Invalid string”call void InsecureWidget::Put(string)

Analysis • What went wrong? • In C#, overridden methods can only be invoked through “base” calls • In IL, they can be called directly • So there are programs in IL that can provoke behaviour not possible from C# • What is a good way to characterize this? • Translation from C# to IL fails to be fully abstract • See “Protection in Programming Language Translation”, Abadi, 1998 • How can we fix it? • Not easily: IL was designed for multiple languages, with conflicting goals

An ideal: full abstraction • Ensure that all abstractions of the programming language are enforced by the runtime • programmers don’t have to know what’s underneath • if they understand the programming language, they understand the platform programming model • Ensure that translation from C# to IL is fully abstract C# program Properties that hold here... ...also hold here IL program

Full abstraction • Two programs are equivalent if they have the same behaviour in all contexts of the language e.g. • A translation is “fully abstract” if it respects equivalence • For us: • the “translation” is from source language (C# etc) to MSIL • if there exist contexts (e.g. other code) in MSIL that can distinguish equivalent source programs, then the translation fails to be fully abstract class Secret { private int f; public Secret(int fv) { f = fv; } public Set(int fv) { f = fv; }} class Secret { public Secret(int fv) { } public Set(int fv) { }} ≈

Full abstraction for Java • Translation from Java to JVML is not quite fully abstract (Abadi, 1998) • At least one failure: access modifiers in inner classes • a late addition to the language • not directly supported by the JVM • compiled by translation => impractical to make fully-abstract without changing the JVM

Full abstraction for C#? • A number of failures • Excuse: multiple languages target the CLR, with different goals • The JVM was designed for a single language: Java. (Almost) Full- abstraction was probably an accident; though in retrospect it’s a good thing. • For C#/CLR, we can catalogue failures of full abstraction and propose fixes • either: change the translation from C# to IL • or: reduce expressivity of IL (fewer IL contexts) • or: increase the expressivity of C# (more C# contexts) • At least: document the failures, educate programmers, provide tools to spot insecure programming patterns

Example 2Encapsulation of object state • Programmer expectation: instances of types whose API ensures immutability are immutable. • Ex: String, DateTime, Int32 • Boxing shouldn’t make any difference, should it? // A dictionary keyed on stringsclass StringDict { private Hashtable dict; public object Get(string s) { return dict[s]; } internal void Set(string s, object o) { … }…}static StringDict personalData;// In a module far away…// We cannot update from here object salary = personalData.Get(“Salary”); • // Oh, yes we can! Just get pointer to interiorldloc salaryunbox int32stind.i4 1000000

Example 2Encapsulation of object state • An equivalence that is not preserved: • Fix? • In CLR type system: disallow update after unboxing public static int Foo(int x) { object y = (object) x; Bar(y); return x;} public static int Foo(int x) { object y = (object) x; Bar(y); return (int) y; } ≈

Example 3thisis valid object instance? • Instance methods are always invoked on a valid instance, surely?class Foo { // Instance registered for privileged action private static Foo registered = null; // Only called from this module (internal access) internal void Register() { registered = this }; public void Bar() { if (this == registered) { // Perform privileged action } }}// We can’t execute privileged action from another module • // Oh, yes we can! Just call-direct-with-nullldnullcall void Foo::Bar()

Example 3 thisis valid object instance? • An equivalence that is not preserved: • Fix? • In C# compiler: explicit check-for-null at start of method • In CLR: check-for-null at call-site (as with virtual call) class C { public bool Foo() { return true; } …} class C { public bool Foo() { return this != null; } …} ≈

Example 4 Exceptions are instances of System.Exception? try { // perform some action, to completion } catch (Exception e) { // undo action whenever an exception was thrown in try-block }// Action either ran to completion, or was fully undone // Not necessarily! From IL, can throw any objectnewobj instance void System.Object::.ctor()throw

Example 5Booleans are two-valued? void Foo(bool b){ bool c = !b; if (!c != b) { Console.WriteLine(“This cannot happen”); }} // Oh yes it can! ldc.i4 2call void Foo(bool)

Example 5 Booleans are two-valued? • An equivalence that is not preserved: • Fix? • Change C# compilation of == and != for bool so that it cares only about zero/non-zero-ness static bool Foo(bool x, bool y) { return (x == false) == (y == false);} static bool Foo(bool x, bool y) { return x==y;} ≈

Weak abstractions • Some abstractions aren’t broken; they’re just a bit weak • arrays are always mutable • developers forget this and define “readonly” properties with array types • run-time types break “privacy by subsumption” • solution to array problem would be to return array as an IEnumerable (a read-only enumerator) • but run-time types let programmer “cast” back to the array • Other abstractions are broken not by IL but by library classes • e.g. delegates (closures) would “encapsulate” code & object state if it weren’t for System.Delegate.Target and System.Delegate.Method methods.

Why bother? • Even if the translation from C# to IL were fully abstract, reasoning about C# programs would still be hard. • Programmers make mistakes in writing secure code • Tools for automating reasoning about programs are still in their infancy • There are many other pitfalls in the language • So why bother about full abstraction? • Because it’s a great starting point: • The ability to reason about C# programs “in C#” is hugely simplifying • Even better: if we could cut down to a subset of C# that suffices

Formalize? • Proofs of full abstraction are hard • We don’t have a complete formal model of C# • We don’t have a complete formal model of IL • So what to do? • Optimist: even if we can’t formalize, we can identify failures, and fix them all • Pessimist: we can never be sure that we have full abstraction. Instead, focus on certain patterns, prove that these are watertight. Example: • prove that integers are safe! • prove that private fields don’t leak

Conclusions • The programming model is a vital part of the security story for .NET and Java • Programmers need to know what they can trust • “Full abstraction” is the ideal • My choice would be to fix the holes we know about • Might be hard to do • If we can’t or won’t, we should educate developers • Type safety is now taken for granted as a necessity • In the future, full abstraction also?

Untrustworthy Programming Languages

Untrustworthy Programming Languages

Presentation Transcript

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages

Programming Languages