An Introduction toThe Mozart Abstract Machine

An Introduction toTheMozart Abstract Machine Per Brand and Konstantin Popov

The Mozart System - Overview • Mozart Compiler • compiles Oz into an intermediate language • written in Oz • Mozart Virtual Machine • executes intermediate code • written in C++ • Tcl/Tk interpreter (GUI) • Emacs-based OPI (Emacs Lisp modus)

Virtual Machines - why? • Portability • the same intermediate code runs everywhere • of course, one has to have VM on target platform • Easier to implement! • The so-called “semantic gap” between source language and machine language is filled by the intermediate language • both Mozart Compiler and Mozart VM taken together are simpler than a potential “Oz to machine code” compiler!!

Virtual Machines around... • Historically: Lisp, Smalltalk • Low-level, stack-based: Forth • Logic programming: Prolog, etc. • Functional programming: ML, Haskell, Erlang • Modern imperative: Java

The Mozart VM - the Idea • VM is a loop fetching and executing instructions. • Instructions: creating data structures, conditionals, procedure calls, thread creation etc. • Values are stored in the Store. • like in the language itself • VM has a program pointer and registers. • registers refer values in the Store

Property of Mozart Virtual Machine • Register-based virtual machine • Temporaries and parameters are found in registers (so-called X registers) • Java is stack-based • Register-based vs stack-based • closer to machine architecture - less work for the JIT • X registers are either machine registers or at least in cache • instructions are longer in register-based than stack-based machine • Multi-paradigm virtual machine

Terminology • X-registers • a set of registers common to the whole virtual machine • Y-registers • corresponds to stack variables (local variables) in conventional programming variables • relative the current frame • G-registers • closure references • relative current procedure

The Mozart VM - the Idea (I) Emulator Code Area emulator() { ... while (1) { op = fetch(PC)); switch (op) { case call(X): … inc(PC); continue; … … } PC … Inst(x(0)) inst(g(1)) inst ... Registers Store Atom: a

declare P in proc {P} {System.show 'hello world'} end {P} ‘Hello World’ example (1) • ‘P’ is a procedure printing ‘ hello world’ • ‘P’s closure contains a reference to the ‘System’ module!

lbl(7) definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(1024)) call(g(1024)) declare P in proc {P} {System.show 'hello world'} end {P} ‘Hello World’ example (simplified) compiles

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(1024)) call(g(1024)) ‘Hello World’ example (2) • Creates a procedure as a first-class value. • x(0) is the register that will refer the procedure • 21 is the label after the definition • g(122) is the register refering the ‘System’ module

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) ‘Hello World’ example (3) • Moves the content of g(0) • into x(0) • g(0) contains a reference to the ‘System’ module • g(0) is local to the procedure and initialized by ‘definition’ (discussed later!)

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) ‘Hello World’ example (4) • Retrieves the ‘show’ procedure out of the ‘System’ module • x(0) is initialised above • x(1) becomes a reference to the ‘Show’ procedure

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) ‘Hello World’ example (5) Creates an atom ‘hello world’ in the Store and puts a reference to it into x(0)

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) ‘Hello World’ example (6) Now, x(1) refers the ‘Show’ procedure and x(0) refers ‘hello world’. ‘Show’ accesses the argument as x(0)!

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) ‘Hello World’ example (7) • Returns control to the place • just after ‘call(g(0))’ • ‘endDefinition’ is not used for execution per se

definition(x(0) 21 [g(122)]) move(g(0) x(0)) inlineDot(x(0) show x(1)) putConstant(‘hello world’ x(0)) call(x(1)) return endDefinition(7) lbl(21) unify(x(0) g(0)) call(g(0)) … %% continue... ‘Hello World’ example (8) Execution proceeds futher...

Oz Data Types in VM • A (partial) value in the VM is a graph such that: • nodes of primitive values (atoms, integers etc.) have no outgoing arcs • nodes of compound values (e.g. records) do have outgoing arcs; we call them references • variable nodes can be bound, after which they become transparent references

Primitive Values • Atoms - objects with strings inside • Integers - objects with integers inside • Boolean - objects with 0-1 values inside… • Conveniently, boxes are real C++ objects with operations relevant to their types.

Records • A record in VM is an object that refers: • an atom (name) which is the record’s label • a sorted list of feature names • record subtrees (stored in an array) • For the sake of efficiency, records refer also hash tables that map feature names to arrays’ indexes

Records (II) R = label(f1: a f2: 1) Hash Table R label a [f1 f2] 1

Records (III) R = label(f1: a f2: R) Hash Table R label a [f1 f2]

Cells • A Cell is just a box with a reference to a value. C = {Cell.new unit} C unit

Abstractions • (Remember) Oz Procedures can refer values in their lexical scope - they are closures • environment of a procedure is an array of references (g-registers) Code Area declare EnvVar Proc in proc {Proc X} {EnvVar X} end Proc … inst(…) inst(…) … return ... PC EnvVar

Representation of Types of Nodes • Types of nodes in the store are represented by references which are typed. We call them tagged references (3 or 4 bits). Emulator registers int 1 list list int 2 list

Representation of Types of Nodes-2 • Sometimes there needs to be a combination of tagged reference or pointer and • Tagged object (each object knows its size) Emulator registers obj ext

Variables • A variable is an object such that: • Unbound variable has no reference in it. Thus, it looks like a primitive value • Unbound variable is recognised by the VM as such • Bound variable object refers another value • Bound variable object is transparent for operations on values, I.e. becomes a reference • The VM can step through adjacent references. This is called dereferencing

X X X X sva home-> ref ref ref Y Y Y Y sva home-> sva home-> ref ref Z Z Z Z sva home-> sva home-> sva home-> int 1 Variables • X Y Z in X=Y, X=Z, X=1

Compiling Data Structures • Values from a program text need to be constructed in the Store. • Primitive values are constructed with ‘putInt’, ‘putConstant’, etc. X=atomEx putConstant(atomEx x(2))

Compiling Records • Records are constructed in the top-down way, similar to the Prolog’s WAM one getRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a) R = rec(f1:tup(a) f2:2)

Compiling Records (2) R = rec(f1:tup(a) f2:2) Creates a record node with subtrees which are unbound variables putRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a)

Compiling Records (3) R = rec(f1:tup(a) f2:2) putRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a) Unifies the first subtree (under ‘f1’) with a new variable in x(1)

Compiling Records (4) R = rec(f1:tup(a) f2:2) putRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a) Unifies the second subtree (under ‘f2’) with integer 2

Compiling Records (5) R = rec(f1:tup(a) f2:2) putRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a) Unifies x(1) with a new tuple

Compiling Records (6) R = rec(f1:tup(a) f2:2) putRectord(rec [f1 f2] x(2)) unifyVariable(x(1)) unifyNumber(2) putRecord(t 1 x(1)) unifyLiteral(a) Unifies the first (and sole) subtree of tup(…) with ‘a’

Compiling Abstractions • One specifies registers that are to be saved in the closure … lbl(7) definition(x(0) 21 [g(122)]) ... return endDefinition(7) lbl(21) ... declare EnvVar Proc in proc {Proc X} {EnvVar X} end

Conditionals • Check condition(s) and proceed in one of two branches. • there is ‘branch <label>’ instruction • very similar to C compiled for any RISC architecture!

Conditionals (II) … %% x(1) contains ‘X’ %% x(2) contains ‘Show’ testNumber(x(1) 1 22) putConstant(x(0) ok) call(x(2)) % ‘ok’ (x(0)) passed branch 31 % skip ‘else’ clause lbl(22) putConstant(x(0) no) call(x(2)) % ‘no’ (x(0)) passed lbl(31) ... declare X in if X == 1 then {Show ok} else {Show no} end

Procedure Application • Arguments are passed in X registers • there is one single set of X registers • Return point is saved in a task on the task stack • task stack is called so because it serves yet other purposes (e.g. exception handling) • Procedure finishes with ‘return’ • pops the task from the stack

Procedure Application (II) … inst(…) ... return ... PC … call(…) inst(…) ... … call(…) inst(…) ... Task Stack

Local Variables (II) PC … … move y(0) x(1) ... return ... Y registers … call(…) ... move y(1) x(1) ... Task Stack

Local Variables • Local variables are kept in Y registers • associated with tasks; only the topmost set is accessible for manipulation • explicitly allocated and deallocated through allocate <N> and deallocate instructions • Y registers are accessible through move <reg><reg> instructions

Tail-call optimization • Task frame is only needed when a procedure contains either • 2 or more call instructions • 1 call instruction but other instructions follow • Otherwise no frame allocated • Example: Partition (tail-recursive)

Accessing Closure Variables • A pointer to the abstraction’s environment array is known to the VM as G registers • set by call <reg> instruction • saved in stack when a nested procedure is called • accessible by move <reg><reg> instructions • Thus, a task in the task stack is a triple <PCret,Yregs,Gregs>

Memory Management (overview) • Values in the store can become garbage • e.g. when an array of Y registers is deallocated • Garbage collection reclaims garbage: it traces all alive data and frees unused space • Mozart (so far) exploits a ‘stop-and-copy’ collector: alive data is copied into a new area, and the old are is freed (including garbage!) • Nodes reachable through registers and stack in the VM are considered alive (i.e. could be accessed in the future)

Threads • Threads are created by means of the thread <lbl> instruction … thread E end ... … thread(33) C[E] % code for E lbl(33) … • E is executed concurrently in a new thread

Threads (II) • A thread consists essentially out of the task stack • A thread can be runnable, running, blocked or terminated. • Blocked: can not advance because of lack of information in the Store • VM contains a scheduler and a pool of runnable threads • Question: how to manage blocked threads?

Threads (III) • Blocked threads are associated with variable(s) those missing bindings caused threads to block Thread nil stack suspension declare X in if X == 1 then … end X Store

Threads (IV) • Suspensions are created by the e.g. test instructions used for compiling conditionals • There can be many threads blocked on the same variable! • Binding a variable involves scheduling all blocked threads for execution • suspensions are deallocated • threads are entered into the threads pool

Advanced Issues - Data Types • Ports are objects with the ‘send’ method that just adds elements to the port’s stream • no additional synchronisation primitive(s) are needed • Objects have highly-optimised built-in implementation: • dedicated (C++) abstraction • specialised VM instructions for method application, etc.

An Introduction toThe Mozart Abstract Machine