460 likes | 890 Views
Programming Languages: The Essence of Computer Science. Robert Harper Carnegie Mellon University October, 2002. CS Is About Programming. How to build systems. Better, faster, cheaper. Reliable, maintainable, extensible. Evaluating and comparing systems. Performance, behavior, security.
E N D
Programming Languages: The Essence of Computer Science Robert HarperCarnegie Mellon University October, 2002
CS Is About Programming • How to build systems. • Better, faster, cheaper. • Reliable, maintainable, extensible. • Evaluating and comparing systems. • Performance, behavior, security. • Compliance, usability. • What can and can’t be done. • Algorithms and complexity.
Programming Is Linguistic • Programming is an explanatory activity. • To yourself, now and in the future. • To whoever has to read your code. • To the compiler, which has to make it run. • Explanations require language. • To state clearly what is going on. • To communicate ideas over space and time.
Programming Languages Are Essential • Therefore languages are of the essence in computer science. • Programming languages, in the familiar sense. • Description languages, design patterns, architecture languages, specification languages.
Some Conventional Wisdom • PL research is over. • The language of the future is X … for various values of X. • It’s not about the one, true language! • Anyone can design a PL. • Clearly, anyone does. • But look what you get! TCL, Perl, TeX, C++. • PL research is irrelevant to practice. • Seen as a purely academic pursuit. • But the tide is turning as successes accumulate.
Some Accomplishments • High-level languages. • Static type disciplines. • Automatic storage management. • Objects, classes, ADT’s. • Sophisticated compiler technology. • Specification and verification.
Why People Don’t Notice • It takes decades to go from research to practice. • Similar to other areas, such as algorithms, AI. • Large installed base of programmers. • Ideas become “common sense” long before they are widely used. • Even very simple things, such as lexical scope, were once controversial! • Not to mention data abstraction, OOP, ….
Some Important Trends • Safety with performance. • High-level languages with good compilers. • Low-level languages with good type systems. • Languages, not tools. • The design should live with the code. • Conformance should be checkable throughout the evolution of the code.
Some Important Trends • Interplay between theory and practice. • “Prove theorems and report numbers.” • Build systems to assess practicality and to discover new problems. • Full-spectrum coverage. • Functional and imperative. • High and low-level languages. • Design and implementation.
Some Important Trends • Type theory as the GUT of PL’s. • Provides a precise criterion for safety and sanity of a design. • “Features” correspond to types. • Close connections with logics and semantics.
What Is A Type System? • Static semantics: the well-formed programs. • Dynamic semantics: the execution model.
What is a Type System? • Safety theorem: types predict behavior. • Types describe the states of an abstract machine model. • Execution behavior must cohere with these descriptions. • Thus a type is a specification and a type checker is a theorem prover.
Types Are Specifications • Examples: • e : float means that e evaluates to a value in, say, floating point register 0. • e : float ! int means that e is a procedure that is called with arg in FR0 and returns with result in GR0. • e : queue means that e behaves like a queue.
Types Are Formal Methods • Type checking is the most successful formal method! • In principal there are no limits. • In practice there is no end in sight. • Examples: • Using types for low-level languages, say inside a compiler. • Extending the expressiveness of type systems for high-level languages.
Types in Compilation • Conventional compilers: Source = L1 L2 … Ln = Target : T1 • Types apply only to the source code. • Type check, then discard types. • If compiler is correct, target code is safe.
Typed Intermediate Languages • Generalize syntax-directed translation to type-directed translation. • Intermediate languages come equipped with a type system. • Compiler transformations translate both a program and its type. • Translation preserves typing: if e:T then e*:T* after translation
Typed Intermediate Languages • Type-directed translation: Source = L1 L2 … Ln = Target : : : T1 T2 … Tn • Transfers typing properties from source code to object code. • Check integrity of compiler. • Exploit types during code generation.
Certifying Compilers • Types on object code certify its safety. • Type checking ensures safety. • Type information ensures verifiability. • Examples of certified object code: • TAL = typed assembly language. • PCC = bare code + proof of safety. • Many variations are being explored.
TAL Example fact: 8.{r1:int, sp:{r1:int, }::} jgz r1, positive mov r1,1 ret positive: push r1 ; sp:int::{t1:int,sp:}:: sub r1,r1,1 call fact[int::{r1:int,sp:}::] imul r1,r1,r2 pop r2 ; sp:{r1:int,sp:}::ret
Types for Low-Level Languages • What is a good type system for a low-level language? • Should expose data representations. • Should allow for low-level “hacks”. • Should be verifiably safe. • Should not compromise efficiency. • Current systems make serious compromises. • Very weak safety properties. • Force atomicity of complex operations.
Example: Memory Allocation • Most type systems take an atomic view of constructors. • Allocate and initialize in “one step”. • Even HLL’s like Java impose restrictions. • We’d like to expose the “fine structure”. • Support code motion such as coalescing. • Allow incremental initialization. • But nevertheless ensure safety!
Example: Memory Allocation • An allocation protocol (used in TILT): • Reserve: obtain raw, un-initialized space. • Initialize: assign values to the parts. • Allocate: baptize as a valid object. • Current type systems cannot handle this. • Partially initialized objects. • In-place modification of parts. • Interaction with collected heap.
HP Example: Memory Allocation Heap ? 1 ? 2 ? 0 ? AP LP AP HP
A Low-Level Type System • Borrow two ideas from linear logic. • Restricted and unrestricted variables. • A modality to mediate between them. • Restricted variables are resources. • Think: pointer into middle of an object. • Unrestricted variables are standard. • Think: heap pointer.
A Low-Level Type System • Variables are bound to validobjects. • Can be used freely. • Garbage collected when inaccessible. • Resources are bound to parts of objects-in-creation. • Cannot be passed around freely. • Explicitly allocated and disposed.
Restrictions on Resources • Linearity:use resources exactly once. • Admits re-typing after initialization. • Ensure allocation before general usage. • Ordering:resource adjacency matters. • Admit “pointers into the middle” of objects. • Supports in-place, piecemeal initialization.
Variables and Resources • Typing judgments: • Ordering of x’s does not matter. • Abstract “mobile” locations. • Ordering and usage of a’s does matter. • Abstract “pinned” locations, with (a form of) pointer arithmetic.
Low-Level Type Constructors • Contiguous data: 1²2. • Two contiguous values. • Two adjacent words: int ² int. • Mobile data object: !. • A fully initialized object of type . • Example: 1£2 := ! (1²2). • A pointer to an adjacent pair of values.
Allocating a Pair Create names for parts. Resource a is used up! Initialize a1, using it up. Reserve space at a. Re-introduce a1. • Allocate (1,2): Fuse parts and allocate. Must be empty on return.
What Have We Gained? • The ordered type system ensures: • All reserved data is eventually allocated. • Initialization can happen in any order. • Cannot read un-initialized memory. • May also be used for inter-operability. • Adjacency requirements. • Marshalling and un-marshalling. • Precise control over representations.
Types for High-Level Languages • Recall: types express invariants. • Calling sequence, data representation. • Abstraction boundaries (polymorphism). • Theme: capture more invariants. • Representation invariants. • Protocol compliance. • Trade-off expressiveness for simplicity and convenience.
Data Structure Invariants • Example: bit strings. • nat: no leading 0’s. • pos : a non-zero Nat. • bits : any old thing. • Goal: check such properties of code. • Incr takes nat to pos, preserves pos. • Left shift preserves nat, pos.
Data Structure Invariants • Properties of interest: • pos ` nat ` bits • Operations: • : bits • add0 : bits ! bits Æ nat ! pos • add1 : bits ! bits Æ nat ! pos
Data Structure Invariants • Logical consequences: • add0 : nat ! pos Æ nat ! nat • add1 : pos ! pos • : bits • Type check code to check invariants! • Simple bi-directional algorithm suffices. • Example: increment.
Data Structure Invariants • Increment:inc: bits ! bits Æ nat ! posfun inc () )1 | inc (b0)) b1 | inc (b1)) (inc b)0
Data Structure Invariants • Fully mechanically checkable! • Type check inc twice, once for each conjunct. • First pass: assume argument is just bits, derive that result is therefore bits. • Second pass: assume argument is nat, derive that result is pos. • Requires checking entailment of properties. • Decidable for subtype-like behavior.
Value Range Invariants • Array bounds checking (a la Pascal): • [0..size(A)] • Null and non-null objects: • null, really(C) • Aliasing: • its(c): the object c itself
Watch Out! • Such types involve dynamic entities. • [0..size(A)] : A is an array. • its(o) : o is a run-time object. • But types are static! • What is an expression of type[0..size(if … then A else B)]??? • How to get static checking?
Dependent Types • Solution: compile-time proxies. • Track values insofar as possible. • Existentially quantify when its not. • Examples: • 0 : its(0) • + : its(1) £ its(2) ! its(4) • if … then 1 else 2 : 9 n its(n)
Types for the World • What about the state of the world? • The lock l is held. • The file f is open. • The contents of cell c is positive. • But such properties change as execution proceeds! • Here I’m holding the lock, there I’m not. • The file is open now, not later.
Types for the World • Want a simple, checkable type system for the “world”. • Types characterize the current state. • Types of the state change over time. • But what kind of type system? • Properties of the world are ephemeral. • Facts are no longer persistent. • Need: a logic of ephemera.
Types for the World • Ephemera behave strangely: • Cannot be replicated: holding twice is not the same as holding once. • Cannot be ignored: must not lose track of whether you hold a lock. • Once again linear logic does the trick! • Model parts of the world as resources. • Type changes track state changes.
Types for the World • A very simple locking protocol: • acquire :8 l its(l) / free(l) ! unit / held(l) • release : 8 l its(l) / held(l) ! free(l) • new : unit !9 l unit / free(l) Value / World A “linear” function type. Gets “used up”. “Replaces” premise.
Types for the World • What does this buy you? • Stating and enforcing simple protocol properties. • Locality of reasoning: focus only on what changes, not what stays the same. • It’s much harder than it sounds! • Separation logic: Reynolds and O’Hearn. • But it can be done: Vault Project at MSR.
Summary • PL research is providing creative, useful solutions to practical problems. • Building on decades of fundamental research, esp. in logic and type theory. • There is much, much more coming! • Many ripe fruit, lots of new developments. • Many good problems yet to be addressed.
Acknowledgements • Much of the work discussed here was done by or in collaboration withKarl Crary (CMU), Frank Pfenning (CMU), David Walker (Princeton), Greg Morrisett (Cornell). • Plus dozens of our current and former students at Carnegie Mellon.