The secret life of typecheckers

The secret life of typecheckers

Introduction • This presentation is modeled on a paper by Luca Cardelli (Bell Labs, 1985) • A general view of type-checking will be presented from the perspective of the programming language designer • We will explore type systems past, present and future

A little history • Type systems have been around longer than computers • In the 1920s David Hilbert started a program to formalize mathematics as strings of symbols manipulated by logic/grammar rules • Idea was to be able to “mechanically” prove things • Bertrand Russell understood the problems with self-reference and approached Hilbert’s challenge by assigning entities to types • Entities of each type are built up from entities of the preceding type • In 1931 Kurt Gödel proved that consistent systems of any complexity are incomplete, ending Hilbert’s program • Application to programming languages • Computing involves representing and manipulating entities as strings of symbols • Problems of representation and self-reference crop up in numerous ways • We want to mechanically prove things about programs • Types support this

What are types, really? • Types come into play whenever we have a universe of diverse things with a similar representation • Bits in a computer’s memory • XML strings • DNA • If you consider these things in the absence of a type system, you have an “untyped universe” • This means there is really only one type (the memory word, the DNA base pair, etc.)

Operations in untyped universes • Any such universe has various operations that can be performed: • Adding and subtracting (bit strings) • Rendering HTML (XML) • Transcription/translation (DNA) • But these operations are only valid on subsets of the untyped universe • Some XML strings represent HTML documents and some don’t • Some DNA sequences represent valid genes and some don’t • What happens if you blow it? • Tumbolia, the land of dead hiccups and extinguished lightbulbs(Douglas Hofstadter) • The major purpose of a type system is to avoid embarrassing questions about representations, and to forbid situations where these questions might come up (Cardelli)

Type-checking and programming languages • Type-checking avoids these embarrassing questions • Assigns types to constants, operators, variables, and functions • Checks that every operation is performed on inputs of the correct type • Accepts programs that can be proven to have no type errors • Type-checker reads program code and says “ok” or“not ok and here’s why” • By comparison • An interpreter reads program code and executes the instructions • A compiler reads program code and translates it into a different representation of the same program

Type systems in programming languages • The term type system refers to the range of types that can be assigned to variables and values • Base types: int, float, double, etc. • User-defined types (e.g. classes, parameterized types, etc.) • Type systems are somewhat arbitrary, and inspired largely by the typical instruction sets of modern computers • You can create different type systems for the same language that are more or less expressive • Inexpressive type systems are frustrating; they either accept too many erroneous programs, or forbid too many correct ones • Expressive type systems are more precise, rejecting as many erroneous programs as possible and accepting a greater percentage of correct ones

Expressiveness and abstract data types • Imagine a type system that supports only the types intand object • Now you’re compiling this function: intfoo (object o) { return o.bar(); } • Does the type system say yes or no? • If yes, we’re overly permissive – the type-checker doesn’t know whether the “bar” method is really available • If no, we’re overly restrictive • The type system needs to be more expressive – needs to include separate types for each class, etc. • Expressiveness means having a rich language of types enabling the type-checker to determine with the greatest possible precision whether it should accept programs or not

Polymorphism and type inference • Polymorphism gives type-checkers an even bigger headache • Requires a major increase in expressiveness • What is the type of a generic List class? • What is the type of a generic Sort function? • Type checking is simplified by having programmers annotate programs with type information • However this gets painful as the type system becomes expressive • Solution is type inference – let the computer figure out all the types • The goal of type-checking research:Maximize the expressiveness of type systems while minimizing the need for programmers to annotate programs with complex type information

Examples • The best way to explore the subtleties of type systems is to work through examples • Let’s try a few…

Subtyping class Base { }; class Derived : public Base { }; void main(char* args[]) { Base *b = new Derived (); Derived *d = b; } • Is this typesafe? • Should it be accepted by the compiler? • If you add a dynamic cast (i.e. add further annotations to help the compiler), will the compiler add a runtime check? Should it?

Apples and oranges // from one header file struct Apple { int x; }; void appleProcessingService (Apple* a) { } // from another header file struct Orange { int x; }; // source file void main(char* args[]) { appleProcessingService (new Apple()); appleProcessingService (new Orange ()); } • Is this typesafe? Should it be accepted by the compiler? Why (or why not)?

How about this one? // from header file, US edition of software struct Apple { int x; }; void appleProcessingService (Apple* a) { } // from header file, French edition of software structPomme { int x; }; // source file void main(char* args[]) { appleProcessingService (new Apple()); appleProcessingService (new Pomme()); } • Is this typesafe? Should it be accepted by the compiler? Why (or why not)?

Math expressions void main (char* args[]) { int x = 123; int y = 234; int z = x / y; } • Is this typesafe? Should it be accepted by the compiler?

Wouldn’t it be cool if… • We had a “rational” datatype? void main (char* args[]) { int w = 123; int x = 234; rational y = w / x; rational z = w ^ 0.5; } • Any problems here?

What kind of error is this? void main (char* args[]) { int x = 1; int y = 0; int z = x / y; } • Could type systems help us here?

What if we introduced … • A “nonzero” datatype? • Say the compiler requires the divisor to be of type “nonzero”: void main (char* args[]) { int x = 1; nonzero y = 0; int z = x / y; } • Good idea? Or not?

Fibonacci strikes back • Is this typesafe? Could a type-checker prove it? nonzero fib (int x) { if (x < 2) { return 1; } else { return fib(x-1) + fib(x-2); } } • How about this? intinputAndParseNumberFromUser () { } void main (char* args[]) { nonzero x = inputAndParseNumberFromUser (); } • Options?

User-constructed types • Data abstraction implies the ability for programmers to create new types • How do we express the type of variable fooin this example? struct { int x; float y; } foo; • Type theorists usually write the type something like this: (int, float) • The type of an array of integers would be: [int] • An array of arrays of integers would be: [[int]] • The type of a function with an int argument returning a float would be: int → float

User-constructed types • The operators (), [], and → are type constructors • They take types as arguments and define new types • Once you have type constructors, your type system can contain as many types as you like • Type-checker has to cope with all of this, providing a syntax for programmers to write all these types If necessary

Polymorphism • When introducing polymorphic constructs into the language, type constructors are not enough • Type of the Length function for arrays of integers: • [int] → int • Type of the Length function for arrays of anything: • forall (T) { [T] → int } • Introducing polymorphic types into a type system is analogous to introducing functions into a programming language • The above type could also be written: • forall (U) { [U] → int } • U is a type variable and “forall” provides type abstraction • Use of “forall” is called universal quantification because any type can be plugged in to U • Polymorphic types can be specialized: • type V = forall (U) { [U] → int } • type W = V<string>

Why have type notation? • Why do we feel the need to write out these complicated types? • If you’re writing a function, you only need to write the types of the return value and arguments, not the function itself • Two reasons: • If you’re programming with higher order functions (which we’ll be doing more of in the future) it’s helpful to write these types • These functions do have types, regardless of whether we’re writing them out – it would be nice to have a standard notation

Bounded quantification • Bounded quantification is the idea that that only some types can be plugged in to U • For example, if you had a Length function which could only be used on arrays of different kinds of numbers, you could write: • type T = forall (U :: U <= number) { [U] → int } • U is constrained to be a number (or subtype thereof) • But what if you do this? • type V = T<string> • Is that a type mismatch?

Types and kinds • In the spirit of Russell, computer scientists generally like to keep these levels separate. • Higher-level types which ensure correctness of types are called “kinds” • This level of checking is referred to as “kind checking” • There are countless papers floating around with titles like “Is type a type?”and“A new programming language with type : type” • They are exploring the question of whether a type system can operate on itself, or whether levels should be kept separate.

C++ bounded quantification question template <typename T> class Copier { T myStruct; public: void copy () { myStruct.x = myStruct.y; } }; struct IntPair { int x, y; }; struct FloatPair { float x, y; }; struct BogusPair { float x; char* y; }; void main(char* args[]) { Copier<IntPair> cip; cip.copy(); Copier<FloatPair> cfp; cfp.copy(); Copier<BogusPair> cbp; // cbp.copy(); }

Existential quantification • The type of a function that takes an array of T and returns an integer, for some single type T: exists(T) { [T] → int } • At this point we have implicitly defined a type T • We know nothing about type T, except that… • A function of the above type could take a list of them and return an int • T is intuitively a little like a class • It is a type, and we don’t know anything about how it works, but we know a way in which we can use it • Universal and bounded quantification provide the theoretical basis for parameterized types • Existential quantification provides the theoretical basis for information hiding

Type inference • As type systems become more complicated, it becomes more burdensome for programmers to write out types • Would you write expressions like this? forall (U :: U <= number) { [U] → int } myFunction (…) { … } • No – you would just avoid higher order functions • The solution is type inference

Type inference • Allows programmers to omit type declarations and have the compiler infer them • Promises all of the expressiveness of dynamic languages, but with static type safety • Research in this area has come a long way – but there are still valid, type-safe programs which type inference engines cannot handle • Rudimentary type inference (local variables) is coming in .NET 3.0 • Given that type theory experts like Simon Peyton-Jones are at Microsoft we can expect to see this area of .NET evolve rapidly

Ideas for the future • Continue improving type system expressiveness and type inference engines • How about having the type-checker interact with the programmer? e.g. • “Can I assume this will always be an odd number?” • “Can I assume that no instances of this class are constructed outside of this source tree?” • How about monitoring running programs to generate better type annotations for use in future compilations? • How about a graphical interface for creating and manipulating type information

Conclusion • Type-checking is not a simple, tidy field • It’s a matter of tradeoffs and judgment • More expressiveness means that programming languages can become more powerful and polymorphic without compromising type safety • However more expressiveness = more pain for programmers • Working with higher-order functions is great, but not if you have to type 10 lines of type declarations for each line of code • Type inference is a promising solution • Holy grail is to provide all the power of dynamic languages like Lisp, Python, and Ruby with the type safety of C++ and no need to write a single type declaration

The secret life of typecheckers

The secret life of typecheckers

Presentation Transcript

The Secret Life of Flowers

The Secret Code of Life:

The Secret Code of Life:

“The Secret Life of Walter Mitty”

Background - The Secret Life of Bees

“The Secret Life of Austin St.Cyr ”

The Secret Life of Bees

The Secret Life of Bluebirds

The Secret Life of Bees

The Secret Life of Bees

The Secret Life of Bees

Secret Life of Plants

The secret life of citations

The Secret Life Of Bees

The Secret Life of Bees

The Secret Life of Bees

The Secret Life of a LION

The Secret Life of Seahorses

The Secret Life of Bees

The Secret of Life! DNA

The Secret Life of Semicolons

The Secret of Life