Languages of the future:  mega the 701 st programming language

Languages of the future:mega the 701st programming language Tim Sheard Portland State University (formerly from OGI/OHSU)

What’s wrong with today’s languages? • The semantic gap • What does the programmer know about the program? How is this expressed? • The temporal gap • Systems are “configured” with new knowledge at many different times – compile-time, link-time, run-time. How is this expressed?

What will languages of the future be like? • Support reasoning about a program from within the programming language. • Within the reach of most programmers – No Ph.D. required. • Support all of today’s capabilities but organize them in different ways. • Separate powerful but risky features from the rest of the program, spell out obligations needed to control the risk, ensure that obligations are met. • Provide a flexible hierarchy of temporal stages. Track important attributes across stages.

How do we get there? • In small steps, I’m afraid . . . • Two small contributions • Putting the Curry-Howard isomorphism to work for regular programmers • Exploiting staged computation • In this talk, I’ll only talk about the first one

Step 1- Putting Curry-Howard to work • Programming by manipulating proofs of important semantic properties • What is a proof? • How do we exploit proofs? •  is a new point in the design space somewhere between a • Programming language • A logic

Isabelle Coq Elf NuPurl Alfa We need something in between to two extremes! Haskell Python O’Caml Pascal Java C++ C

Dimensions Formal methods systems • Have too few formal systems users. We can’t solve the worlds problems with a handful of users. And, for the most part, the users are “thinkers” not “hackers” • The systems themselves are used to reason about systems, but aren’t designed to execute programs. For the most part, they don’t have rich libraries, I/O etc. • Have a steep learning curve. “It takes a Ph.D. to learn to effectively use these tools.”

Steps between the “concrete” and the “clouds” • Train more users to use formal systems, or add formal features to lower level languages so existing programmers can use formal methods. • Design practical extensions for formal systems and build robust compilers for them, or add formal extensions to practical languages.

Isabelle Coq Elf NuPurl Alpha Haskell Python O’Caml Pascal Java C++ C

Curry Howard • Types are properties • Programs are proofs • A program with type T witness that there exists a program with type T. • If all we have is simple types – like Int or (Bool,String) or [Tree Bool], then the properties are too simple to think of them as very useful proofs.

0 is even 1 is odd, if 2 is even, if 3 is odd, if What is a proof? Am I odd or even? 3 • Requirements for a legal proof • Even is always stacked above odd • Odd is always stacked below even • The numeral decreases by one in each stack • Every stack ends with 0

0 is even 1 is odd 2 is even 3 is odd 1 – 1 = 0 2 – 1 = 1 3 – 1 = 2

Algebraic Datatypes • Inductively formed structured data • Generalizes enumerations, records & tagged variants • data Color = Red | Blue | Green • data Address = A Number Street Town Province MailCode • data Person = Teacher [Class] | Student Major • Types are used to prevent the construction of ill-formed data. • Pattern matching allows abstract high level (yet still efficient) access

ADT’s provide an abstract interface to heap data. We can define parametric polymorphic data • Data Tree a = Fork (Tree a) (Tree a) | Node a | Tip • Fork :: Tree a -> Tree a -> Tree a • Node :: a -> Tree a • Tip :: Tree a Inductivley defined data allows structures of unbounded size Functions defined with pattern matching Sum :: Tree Int -> Int Sum Tip = 0 Sum (Node x) = x Sum (Fork m n) = sum m + sum n Note the “data” declaration introduces values and functions that construct instances of the new type.

Fork (Fork (Node 5) Tip) Tip Fork Fork Tip Node Tip 5

ADT Type Restrictions • Data Tree a = Fork (Tree a) (Tree a) | Node a | Tip • Fork :: Tree a -> Tree a -> Tree a • Node :: a -> Tree a • Tip :: Tree a Restriction: the range of every constructor matches exactly the type being defined

Z :: Even 0 O Z :: Odd 1 E(O Z):: Even 2 O(E(O z)) :: Odd 3 Integer Indexed Type-Constructors Z:: Even 0 E:: Odd m -> Even (m+1) O:: Even m -> Odd (m+1) O(E (O Z)) :: Odd (1+1+1+0) Note Even and Odd are type constructors indexed by integers

Generalized Algebraic Data Structures • Like ADT • Remove the range-type restriction • Allow type constructors to be indexed by things other than normal types.

The “kind” decl introduces new “types” • Allow algebraic definitions to define new “kinds” as well as new “data types” • Example of new type data List a = Nil | Cons a (List a) • Nil and Cons are new values. • They are classified by typeList • Nil :: [a] • Cons :: a -> List a -> List a • Example of new kind kind Nat = Zero | Succ Nat • Zero and Succ are new types. • They are classified by the kind Nat • Zero :: Nat • Succ :: Nat ~> Nat • Succ Zero :: Nat

*2 A hierarchy of values, types, kinds, sorts, … sorts *1 kinds * * ~> * Nat * Nat ~> Nat Int [ Int ] [ ] Zero types Succ 5 [5] values

GADT in mega Zero and Succ encode the natural numbers at the type level kind Nat = Zero | Succ Nat data Even n = Z where n = Zero | ex m . E(Odd m) where n = Succ m data Odd n = ex m . O(Even m) where n = Succ m Even and Odd are proofs constructors

Z :: Even 0 O Z :: Odd 1 E(O Z):: Even 2 O(E(O z)) :: Odd 3 Z:: Even Zero E:: Odd m -> Even (Succ m) O:: Even m -> Odd (Succ m) • Note the different ranges in Z, E and O • The types encode enforce the well formedness.

Removing the restriction allows indexed types • The parameter of a type constructor (e.g. the “a” in “T a”) says something about the values with type “T a” • phantom types • indexed types • Consider an expression language: data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex –- Int variable | Eb –- Bool variable • If b then 3 else x+1 • (Eif Eb • (Eint 3) • (Eplus Ex (Eint 1)) • But, what about terms like: • (Eif (Eint 3) • (Eint 0) • (Eint 9))

Imagine a type-indexed Term datatype Note the different range types! Int :: Int -> Term Int Bool :: Bool -> Term Bool Plus :: Term Int -> Term Int -> Term Int Less :: Term Int -> Term Int -> Term Bool If :: Term Bool -> Term a -> Term a -> Term a X :: Term Int B :: Term Bool

Type-indexed Data • Benefits • The type system disallows ill-formed Terms like: (If (Int 3) (Int 0) (Int 9)) • Documentation • With the right types, such objects act like proofs

Why is (Term a) like a proof? • A value “x” of type “Term a” is like a judgment Γ├ x : a The type systems ensures that only valid judgments can be constructed. Having a value of type “Term a” guarantees (i.e. is a proof of) that the term is well typed. • If b then 3 else x+1 • (If B • (Int 3) • (Plus X • (Int 1)) Γ x = Int Γ b = Bool Γ ├ 1:Int Γ├ x:Int Γ ├ 3:Int Γ ├ x+1:Int Γ ├ b:Bool Γ├ if b then 3 else x+1 : Int

Type-indexed Terms data Term a = Int Int where a=Int | Bool Bool where a=Bool | Plus (Term Int) (Term Int) where a=Int | Less (Term Int) (Term Int) where a=Bool | If (Term Bool) (Term a) (Term a) | X where a = Int | B where a = Bool Int :: forall a.(a=Int) => Int -> Term a We can specialize this kind of type to the ones we want Int :: Int -> Term Int Bool :: Bool -> Term Bool Plus :: Term Int -> Term Int -> Term Int Less :: Term Int -> Term Int -> Term Bool If :: Term Bool -> Term a -> Term a -> Term a X :: Term Int B :: Term Bool

Problem – Type Checking How do we type pattern matching? case x of (Int n)::Term Int -> . . . (Bool b)::Term Bool -> . . . What type is x? Is it Term Int Or is it Term Bool

Obligations and Asumptions data Term a = Int Int where a=Int | Bool Bool where a=Bool | . . . Using a Constructor incurs an Obligation (Int 3)::Term a{Show a=Int} (Bool true)::Term a{Show a=Bool} Pattern matching allows the system to make some Assumptions case x::Term a of (Int n)::Term Int ->{Assume a=Int}. . . (Bool b)::Term Bool ->{Assume a=Bool}. . .

Programming eval :: Term a -> (Int,Bool) -> a eval (Int n) env = n eval (Bool b) env = b eval (Plus x y) env = eval x env + eval y env eval (Less x y) env = eval x env < eval y env eval (If x y z) env = if (eval x env) then (eval y env) else (eval z env) eval X (n,b) = n eval B (n,b) = b

Type Checking eval :: Term a ->(Int,Bool) -> a eval (Less x y) env = {Assume a=Bool} eval x env < eval y env Less::(a=Bool)=>Term Int -> Term Int -> Term Bool x :: Term Int y :: Term Int (eval x) :: Int (eval y) :: Int (eval x < eval y) :: Bool Assume a=Bool in this context

Basic approach • Data is a parameterized generalized-algebraic datatype • It is indexed by some semantic property • New Kinds introduce new types that are used as indexes • Programs use types to maintain semantic properties • We construct values that are proofs of these properties • The equality constrained types make it possible

Constructing proofs at runtime • Suppose we want to read a string from the user, and interpret that string as an expression. • What if the user types in an expression of the wrong type? • Build a proof that the term is well typed for the context in which we use it

data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex | Eb test :: IO () test = do { text <- readln ; exp::Exp <- parse text ; case typCheck exp of Pair Rint x -> print (show (eval x + 2)) Pair Rbool y -> if (eval y) then print “True” else print “False" Fail -> error "Ill typed term" } A dynamic test of a static property!

Representation Types data Rep t = Rint where t=Int | Rbool where t=Bool • “Rep” is a representation type. It is a normal first class value (at run-time) that represents a static (compile-time) type. • There is a 1-1 correspondence between Rint and Int, and Rbool and Bool. If x:: Rep t then • knowing the shape of x determines its type, • knowing its type determines its shape. • One can’t overemphasize the importance of this!

Constructing a Proof typCheck :: Exp -> Judgment typCheck (Eint n) = Pair Rint (Int n) typCheck (Ebool b) = Pair Rbool (Bool b) typCheck Ex = Pair Rint X typCheck Eb = Pair Rbool B typCheck (Eplus x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rint (Plus a b) _ -> Fail

More cases … typCheck (Eless x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rbool (Less a b) _ -> Fail typCheck (Eif x y z) = case (typCheck x, typCheck y, typCheck z) of (Pair Rbool a, Pair Rint b, Pair Rint c) -> Pair Rint (If a b c) (Pair Rbool a, Pair Rbool b, Pair Rbool c) -> Pair Rbool (If a b c) _ -> Fail

Our Original Goals • Build heterogeneous meta-programming systems • Meta-language ≠ object-language • Type system of the meta-language guarantees semantic properties of object-language • Experiment with Omega • Finding new uses for the power of the type system • Translating existing language-based ideas into Omega • staged interpreters • proof carrying code • language-based security

Serendipity • mega’s type system is good for statically guaranteeing all sorts of properties. • Lists with statically known length • Red–Black Trees • Binomial Heaps • Dynamic Typing • Proof Carrying Code

Conclusion • Stating static properties is a good way to think about programming • It may lead to more reliable programs • The compiler should ensure that programs maintain the stated properties • Generalizing algebraic datatypes make it all possible • Ranges other than “T a” • “a” becomes an index describing a static property of x::T a • New kinds let “a” have arbitrary structure • Computing over “a” is sometimes necessary

Contributions • “Logical Framework” ideas translated into everyday programming idioms. • Manipulating strongly-typed object languages in a semantics-preserving manner. • Implementation of Cheney and Hinze’s equality qualified types in a functional programming language. • Use of new kinds to build new kinds of index sets. • Representation (or Singleton) Types as a way to seamlessly switch between static and dynamic typing. • Demonstration • Show some practical techniques • Lots of examples • Resource: www.cs.pdx.edu/~sheard • Including Emir Pasalic’s Thesis.

Related Work • Logical Frameworks: LF – Bob Harper et. Al • Refinement types – Frank Pfenning • Inductive Families • In type theory -- Peter Dybjer • Epigram -- Zhaohui Luo, James McKinna, Paul Callaghan, and Conor McBride • First-class phantom types -- Cheney and Hinze • Guarded Recursive Data Types • Hong Wei Xi and his students • Guarded Recursive Datatype Constructors • A Typeful Approach to Object-Oriented Programming with Multiple Inheritance • Meta-Programming through Typeful Code Representation • Constraint-based type inference for guarded algebraic data types -- Vincent Simonet and François Pottier • A Systematic Translation of Guarded Recursive Data Types to Existential Types -- Martin Sulzmann • Polymorphic typed defunctionalization -- Pottier and Gauthier. • Towards efficient, typed LR parsers -- Pottier and Régis-Gianas. • First Class Type Equality • A Lightweight Implementation of Generics and Dynamics -- Hinze and Cheney • Typing Dynamic Typing -- Baars and Swierstra • Type-safe cast: Functional pearl -- Wierich • Rogue-Sigma-Pi as a meta-language for LF -- Aaron Stump. • Wobbly types: type inference for generalised algebraic data types -- Peyton Jones, Washburn and Weirich • Cayenne - A Language with Dependent Types -- Lennart Augustsson

Step 2 – Using Staging • Suppose you are writing a document retrieval system. • The user types in a query, and you want to retrieve all documents that meet the query. • The query contains information not known until run-time, but which is constant across all accesses in the document base. • E.g. Width – Indent < Depth && Keyword == “Naval”

Width – Indent < Depth && Keyword == “Naval” • If Width and Indent are constant across all queries, But Depth and Keyword are fields of each document • How can we efficiently build an execution engine that translates the users query (typed as a String) into executable code?

Code in Omega prompt> [| 5 + 5 |] [| 5 + 5 |] : Code Int prompt> run [| 5 + 5 |] 10 : Int prompt> let x = [| 23 |] X prompt> let y = [| 56 - $x |] Y prompt> y [| 56 - 23 |] : Code Int

Dynamic values data Dyn x = Dint Int where x = Int | Dbool Bool where x = Bool | Dyn (Code x) dynamize :: Dyn a -> Code a dynamize (Dint n) = lift n dynamize (Dbool b) = lift b dynamize (Dyn x) = x

translation trans :: Term a -> (Dyn Int,Dyn Int) -> Dyn a trans (Int n) (x,y) = Dint n trans (Bool b) (x,y) = Dbool b trans X (x,y) = x trans Y (x,y) = y trans (Plus a b) xy = case (trans a xy, trans b xy) of (Dint m,Dint n) -> Dint(m+n) (m,n) -> Dyn [| $(dynamize m) + $(dynamize n) |] trans (If a b c) xy = case trans a xy of (Dbool test) -> if test then trans b xy else trans c xy (Dyn test) -> Dyn[| if $test then $(dynamize (trans b xy)) else $(dynamize (trans c xy)) |]

Applying the translation -- if 3 < 5 then (x + (5 + 2)) else y x1 = If (Less (Int 3) (Int 5)) (Plus X (Plus (Int 5) (Int 2))) Y w term = [| \ x y -> $(dynamize(trans term (Dyn [| x |],Dyn [| y |]))) |] -- w x1 -- [| \ x y -> x + 7 |] : Code (Int -> Int -> Int)

Examples we have done • Typed, staged interpreters • For languages with binding, with patterns, algebraic datatypes • Type preserving transformations • Simplify :: Exp t -> Exp t • Cps:: Exp t -> Exp {trans t} • Proof carrying code • Data Structures • Red-Black trees, Binomial Heaps , Static length lists • Languages with security properties • Typed self-describing databases, where meta data in the database describes the database schema • Programs that slip easily between dynamic and statically typed sections. Type-case is easy to encode with no additional mechanism

Languages of the future:  mega the 701 st programming language