280 likes | 418 Views
Implementation of Dynamic Languages on the CLR. John Gough john@SoftwareAutomata.com j.gough@qut.edu.au. Overview. What are dynamic languages? What are the implementation issues? Dynamic code generation Dispatch efficiency The RubyNET project Tool support. Overview.
E N D
Implementation of Dynamic Languages on the CLR John Gough john@SoftwareAutomata.com j.gough@qut.edu.au
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
What are Dynamic Languages? • The languages that I am interested in this talk are the languages that are used for scripting. This includes languages such as Perl, Python, Ruby, Lua and PHP. • The thing that is common to all of these languages is the dynamic nature of name bindings, dynamic typing of values and runtime evaluation of strings as program fragments • Most of these languages, because of their use as scripting languages, have extremely powerful pattern matching facilities. In most case the pattern matching is based on (an extension of) regular expressions.
What are Dynamic Languages? • The standard path for implementation of these languages is by compiling the source program into some kind of intermediate form, usually an attributed abstract syntax tree. This form is then executed, at runtime, by an interpreter that is specific for the source language. • Library code may be stored in the pre-compiled, intermediate form, but the main program is usually run in “compile and go” mode. • It is a characteristic of these languages that some execution may happen during “compilation”, and some compilation may happen during “execution”.
Background … • A number of languages that were “dynamic” according to this definition, were part of Microsoft’s “project -7”. The outcomes of these experiments were somewhat mixed. Several of these experiments are reported in the appendices to “Programming in the .NET Environment”, by Damien Watkins et al. • It appeared that there are difficult performance issues for dynamic languages, which require investigation. • In 2002 my team at QUT investigated the possibility of implementing Perl as a fully verified language on the CLR.
Background … • The conclusion that we reached is that Perlcan be implemented as a fully verified language. • However, in my opinion, Perl is not a very good example of a dynamic language to answer the generic questions of implementation techniques and efficiency. The problem is that Perl is a very large language, with a long history of development and extension, but is very poorly defined. Essentially the (single) Perl source code base defines the somewhat arbitrary semantics of the language. Reproducing these exact semantics in an independent implementation is a challenge.
Background … • Our focus for the current project was to try to “factor out” the common issues that apply to all of the languages of interest so as to find solutions that are widely applicable to implementation • In the event we ended up convinced that there is no common framework that can usefully support all of the languages, but that ideas and implementation techniques will be applicable across a range of languages • My current goal is to find solutions to some of these generic problems and make sure that they are shared across the community
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
What are the Implementation Issues? • Dynamic code generation (discussed later) • Efficiency of method dispatch (discussed later) • Various scope and visibility models • Closures (lexical binding)
Scope and Visibility Models • Dynamic languages typically have multiple name visibility models. Typically these will be • Static, that is, global scope • Lexical, that is, block scoped names • Dynamic scopes, values are declared in code, and are available at all later program points on the locus of control. • Perl also has “my” variables that may shadow globals, but restore the shadowed value when leaving the lexical scope.
Closures • Some languages have procedures that can access the lexical variables of the surrounding scopes. If such procedures are values that can be stored and invoked later, then closures must be created. • One way of implementing closures is to locate all lexical variables in activation records that are allocated on the garbage collected heap. NOT “stack frames”! The point is that activation records do not get collected on scope exit, if they are referenced by a closure. This has some consequences for efficiency. • In some situations the compiler may statically determine the subset of values that must be retained. If all data is heap allocated the references to these values may be copied into the “closure record”.
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
Dynamic Code Generation • A characteristic idiom of many dynamic languages is the evaluation of strings as program fragments. The string, the value of which may be a computed expression, is compiled into code and executed in the lexical environment in which the “eval” is embedded. This is the reason that the compiler must be present at “runtime”. (It also generates a security attack point that demands careful protection!)
Dynamic Code Generation, Experience • One of the libraries in the CLI framework is System.Reflection.Emit. This library allows for the generation of code at runtime, either for saving to a file, or for immediate execution, or both. • The new facilities for lightweight codegen in the Whidbey release are useful for this purpose. • One of our QUT libraries PEAPI (and its successor PERWAPI) writes to a stream data type. We have had good experiences with writing to a memory stream, and then immediately loading the assembly for verification and execution.
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
Dispatching Dynamic Subroutines • The efficiency of subroutine calls in dynamic languages is always problematical. The problem is that the binding of names to code can be changed at runtime in ways which defy static analysis. This means that method dispatch is always indirect via a binding table • It is frustrating to realize that in almost every case the dictionary lookup is not required, since the bindings change only very infrequently. • There are possibilities of using caching to lessen this overhead, but should the cache be done by the front-end or does it need JIT support?
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
The Ruby.NET Project • Currently Wayne Kelly and myself are working on an implementation of the language Ruby for .NET • Ruby is an interesting language, with a nice object model and a clean implementation • Interestingly, the language has a continuations library, which will provide some additional challenges on top of the usual issues of closure implementation • The object model allows for methods to be added to individual objects, or to all objects belonging to the class – the “type system” uses a concept of duck-typing
Ruby.NET, the design framework • The standard implementation of Ruby has an evaluator that traverses an AST form of the program • Despite this, we are translating through to IL, relying on the CLI to provide a single level of interpretation only • After some debate we are using a bottom-up parser built from the productions of the standard ruby.y grammar • It is not yet clear if we will use generics in the IL representation of ruby programs, but the implementation will be hosted only on generics-capable frameworks
How to do continuations … • Continuations are hard, because almost all of the standard techniques used in the functional programming community are not applicable • The exception is a really interesting idea that has been explored by Joe Marshall at Northeastern University for their Scheme implementation • At the moment, this technique looks like the only reasonable attempt at implementing continuations on the CLI
Overview • What are dynamic languages? • What are the implementation issues? • Dynamic code generation • Dispatch efficiency • The RubyNET project • Tool support
Tool Support • Since the object of the exercise is to produce useful techniques and tools for the community, tool building is already a major focus • Last week we released a first public version of GPPG, the Gardens Point Parser Generator . This is a YACC-compatible LALR parser generator that produces (and is written in) C#. • Speed is comparable with native code versions of YACC and, as a first sanity check for correctness, it produces the same reduction sequence as YACC on all of the programs in the Ruby test-suite
Tool Support • The PE file Reader-Writer, PERWAPI, has been available for some time, but now supports all of the generic metadata • The tool is becoming more robust as a result of user feedback – for example it now reads the whole of the v2.0 mscorlib.dll without error, and has been used to bootstrap our (static language) compilers • PDB file support should appear in the next release version
Tool Support • Documentation for PERWAPI is now available. This is a 37-page pdf file covering the internal structure and use of the tool.
More Tools? And where to get’em. • I expect to have a C# version of our bottom-up tree rewriter released this quarter. Still arguing over what to call it … • All of these tools and compilers are released under a “FreeBSD” style open source license • GPPG is available from http://plas.fit.qut.edu.au/gppg/ • PERWAPI, and the documentation, are available from http://plas.fit.qut.edu.au/perwapi/