430 likes | 599 Views
Scheming Cats?. What am I doing?. I am developing a translator from Scheme to a strongly typed pure-functional stack-based language called Cat. So what is Cat. Cat is a strongly typed functional stack-based language.
E N D
What am I doing? I am developing a translator from Scheme to a strongly typed pure-functional stack-based language called Cat.
So what is Cat Cat is a strongly typed functional stack-based language. Cat is designed primarily as an intermediate language for optimization and translation. Cat has no environment, and no names.
Why convert to point-free form? Point-free form (sometimes called BMF or Squiggol) is specifically designed for ease of transformation. Global optimisations of point-free programs can be performed by applying sequences of very simple, provably correct, bi-directional rewrite rules. The language, its re-write rules, along with the existence of a simple cost model, makes it possible to frame optimisation of point-free form as a search problem. - Brad Alexander, Searching Squiggol Space
Why Translate Scheme? If I can support Scheme, I can support virtually any other language. Most "general-purpose" intermediate languages can't deal with functional languages very well. I believe that I can eventually use Cat to optimize Scheme.
Why would Schemers Care? Intermediate languages are always useful for developing optimizations. Translating from Scheme to Cat would give us improved type information at compile-time. This in turn leads to better performance, safety, and earlier bug detection.
Cat Inspiration The languages that most influenced the design of Cat are: Joy, MSIL, JVML, and Haskell. Cat superficially most closely resembles Joy.
Cat Contribution Cat extends Joy/Postscript by adding a static type system. Cat extends Forth/MSIL/Java bytecode by adding type safe higher-order functions.
Why do I want HOFs in an intermediate language? Higher-order functions allow significantly more compact code. They also allow more abstract representation of algorithms. They make optimization easier.
Current Cat Implementation A relatively stable .NET interpreter/compiler. Macro-Cat term rewriting language for optimization. A rudimentary Cat to C++ translator. All public domain!
Cat Semantics term ::== literal // e.g. 42, "hello" primitive // e.g. dup, swap [term] // quotation term tem // composition
Cat in a Nutshell 12 13 swap == 13 12 2 dup == 2 2 42 pop == no-op 12 [inc] apply == 13 5 true [inc] [dec] if == 6 42 quote == [42] [1] [add] compose == [1 add]
Cat Type Signatures Essentially stack-effect diagrams. (Conumption -> Production) Consumption and production are type-vectors representing the configuration of types on the top of the stack.
Side-Effects Cat distinguishes between impure and pure functions (those with and without side-effects). pop : ('a -> ) write : ('a ~> )
Examples of Types 42 : ( -> int) [42] : ( -> ( -> int)) inc : (int -> int) add_int : (int int -> int) dup : ('a -> 'a 'a) pop : ('a -> ) swap : ('a 'b -> 'b 'a)
More Types Lowercase letters = scalar types Uppercase letters = type vectors apply : ('A ('A -> 'B) -> 'B) if : ('A bool ('A -> 'B) ('A -> 'B) -> 'B) dip : ('A 'b ('A -> 'C) -> 'C 'b) quote : ('a -> ( -> 'a))
Why is Cat's Type System Relevant? Two reasons: 1. Inference of Cat types is compositional. 2. Cat can infer the type of recursive combinators.
Compositional Type Inferenceis Good. "Principal typings allow compositional type inference, where the procedure of finding types for a term uses only the analysis results for its immediate sub-fragments, which can be analyzed independently in any order. Compositionality helps with such things as performing separate analysis of program modules." - From The Essence of Principal Typings by J.B. Wells
The M combinator (self-application) In Scheme: (define M (lambda (f) (f f))) In Cat: define M : ('A self -> 'B) { dup apply }
Equirecursive Type Inference The m combinator is "lambda x.x(x)". define m { dup apply } m : ('A self -> 'B) Equirecursive means: ('A self -> 'B) == ('A ('A self -> 'B) -> 'B)
Cat Rejects Omega Despite having equirecursive types, Cat will reject the omega combinator at compile-time: define omega { [m] m } Open question: Are infinite loops in imperative programs ever reducible to omega? What other infinite loops can Cat reject?
Partial Application define partial-apply : ('a ('B 'a -> 'C) -> ('B -> 'C)) { swap quote swap compose } For example: 1 [add] partial-apply == [1 add]
Dipping Below The Top define dip : ('A 'b ('A -> 'C) -> 'C 'b) { swap quote compose apply } 3 4 [2 *] dip == 6 4 Creates auxiliary storage by composing functions.
Factorial Function define fac { dup 1 lteq [pop 1] [dup dec fac *] if }
Fix-Point Factorial define y_fact_step { over 0 eq [pop2 1] [[dup dec] dip apply *] if } define y_fact { [y_fact_step] y }
Binary Recursion as a Function The bin_rec primitive instruction, for "binary recursion". Accepts four functions as input: • termination condition • termination action • split function • merge function
Fibonacci define fib : (int -> int) { // less than or equal to 1? [dup 1 lteq] // Base case do nothing [] // split into (n – 1) and (n – 2) [dec dup dec] // add two results [add_int] bin_rec }
Quick Sort define qsort : (list -> list) { // Does list have 0 or 1 elements? [small] // Base case do nothing [] // Split the list using the head as a pivot // storing the pivot under for later use [uncons under [lt] bind split] // Append the pivot to the first list // then concatenate the two lists. [[swap cons] dip cat] bin_rec }
Map-Reduce define map_reduce { [map flatten self_join] dip map } A naive but working implementation of the Google MapReduce algorithm.
Map One of my favorite functions. Having map in an intermediate language, would be very powerful. It would be much better than worrying about micro-optimizing for loops.
Map is Compact for (int i=0; i<1000000000; ++i) a[i] = a[i] * 3 / 2; or [3 * 2 /] map
Map can be Lazy Consider this code: define mid { count 2 / nth } define input { 0 billion [] range_gen } define test { input [inc] map mid } If map is implemented lazily this can run thousands of times more quickly.
Map is Optimizable [3 *] map [2 /] map is the same as [3 * 2 /] map
Term Rewriting Language Doesn't this look like something that could be easily expressed as term rewriting rules? Like Stratego or the GHC rewriting rules.
Voici Meta-Cat macro { map [$A] map } == { [$A] compose map } is a valid rewriting term in the Meta-Cat extension to Cat. Over 100 macros are currently available. Next step: adding types to term variables.
Scheme to Cat Challenges: • Removing names. • Emulating an environment. • Implementing continuations in Cat efficiently. • Maintaining useful types. • Getting the recursive types correct
Lack of Names Well known solution: lambda-lifting. Cat implementation already supports lambda-lifting. \a.[a a apply] == [dup apply] bind
Lack of Environment The challenge is to simulate the environment in a pure language. One sol'n: environment is a hash-table that stays on the top of the stack and is recreated whenever it is modified. This resembles a monad. More research is needed to figure out the relationship to category theory.
Efficient Continuations Marc will have to help me out with that one. Exceptions are easy though for two reasons: • Only thrown once • Never refer to stack below a try statement. • Their lifetime is strictly delineated
Cat Exceptions [try_block] [catch_block] try try : (( -> 'A) (any -> 'A) -> 'A) Try is like a flag on the stack. We only ever need to unwind the stack. This is an example of an exit continuation.
Continuation Musings Continuation is simply a copy of the stack. If stack isn't ever popped before continuation is called, I don't need to copy it, I just need to unwind. In addition I want to know how long continuations live, and how often they are used. Perhaps the type system could be used?
Maintaining Useful Types Statically typing a dynamic language often leads to lots of use of runtime polymorphic variants. This can defeat the purpose, but it is unavoidable sometimes (e.g. "eval"). However, any amount of static typing is useful. Additionally by using a compositional type system this allows us to infer code blocks independently of each other.
Getting Recursive Types Correct The algorithms for inferring equirecursive types is quite hard. While preparing the presentation I found a bug in the Y combinator