370 likes | 476 Views
Dr Jekyll and Mr C. Rob Ennals Intel Research Cambridge. C is holding us back. Unsafe. Unexpressive. Much important software is currently written in C Even if not the most lines of code, probably most of the cycles. Security problems Hard to analyse Hard to debug Unreliable.
E N D
Dr Jekyll and Mr C Rob EnnalsIntel Research Cambridge
C is holding us back Unsafe Unexpressive • Much important software is currently written in C • Even if not the most lines of code, probably most of the cycles Security problems Hard to analyse Hard to debug Unreliable Hard to understand Hard to write Hard to parallelise connected Dr Jekyll and Mr C (SRG Talk)
Functional Languages are Great! Features Benefits Safety Generic Types Lambda Expressions Controlled Effects Type Classes Easier to write Easier to understand More reliable More secure Easier to parellelize So why does nobody use them? Dr Jekyll and Mr C (SRG Talk)
Trust Programmers Libraries C Existing Code Tools A Problem: Language Switching Costs • Much important software is currently written in C Moving to a new language incurs high switching costs • Programmers, tools, libraries, and existing code, all tied to C Dr Jekyll and Mr C (SRG Talk)
A Solution: Lossless Round Tripping C Programmer • Jekyll is a high level functional programming language • Featuring most of the features of Haskell + more Jekyll can be translated losslessly to and from C • Preserving layout, formatting, comments, everything • C code is readable and editable C File Jekyll Programmer Jekyll File C File Jekyll File Dr Jekyll and Mr C (SRG Talk)
Switching Costs are Reduced • Programmers and Tools can still use the C version. • Existing C code can stay in C • Although there may be benefit to be had from modifying it • If Jekyll ceases to be maintained, just use the C C Trust C Programmers C Libraries Jekyll Existing C Code C Tools Dr Jekyll and Mr C (SRG Talk)
Jekyll is Transparent • C Programmers can edit programs without knowing about Jekyll. • This requires that: • C programmers can understand C produced by the Jekyll Translator • The Jekyll translator can understand edits made by C programmers C File C Programmer Jekyll Translator • Jekyll is very tolerant of edits to C code. This is essential. Dr Jekyll and Mr C (SRG Talk)
We assume that C programmers doNOT KNOWANYTHINGabout Jekyll • But they still need to be able to edit Jekyll-encoded C files Dr Jekyll and Mr C (SRG Talk)
Jekyll-Encoded C files are Unannotated Jekyll Encoded C struct<%a> Node{ %a *element; List<%a> *tail; }; struct Node{ void *element; List *tail; } • No funny macros, No weird comments, No restrictive naming rules • Just good, readable, editable, C • All extra info is simply thrown away • retrieved from the previous Jekyll version when converted back Dr Jekyll and Mr C (SRG Talk)
Reconstruction based on previous version Old Jekyll File • There are many ways to decode a C file as Jekyll • Extra type info, different features being encoded, etc etc • We chose the encoding that matches the previous version • Aiming to minimise the textual difference from the previous Jekyll file This allows Jekyll to correctly decode unannotated C New C File New Jekyll File Dr Jekyll and Mr C (SRG Talk)
Encoding based on the previous version Old C File • There are many ways to encode a Jekyll feature as C • Temporary names, whitespace, different encodings, etc • We chose the encoding that matches the previous C version • Aiming to minimise the textual difference from the previous file This allows Jekyll to avoid modifying hand-edited C New Jekyll File New C File Dr Jekyll and Mr C (SRG Talk)
Jekyll is another view of C C Repository Jekyll Repository Jekyll Programmer • Authoritative source code can stay as C • But programmers and tools can also view it as Jekyll • C Programmers need not know Jekyll is even being used. C File Jekyll File Jekyll File C Programmer C File C File Jekyll File Dr Jekyll and Mr C (SRG Talk)
All of C Most of O'Caml +Haskell Unsafe Features Imperative Features Low-Level Features C Types C Expressions Pre-processor Algebraic Types Type Classes Lambda Expressions Pattern Matching Generic Types Type Safety Optional GC NOT LAZY! Jekyll Features Jekyll • Use of unsafe features causes a warning unless marked as “unsafe” Dr Jekyll and Mr C (SRG Talk)
What is Jekyll • Jekyll & its C Encoding • Lossless Translation • Demo Dr Jekyll and Mr C (SRG Talk)
Superset of C Jekyll • All C programs are valid Jekyll programs, unless: • They use extensions that Jekyll does not understand • They use the pre-processor in a way that Jekyll does not understand In future: Support everything GCC can compile C Dr Jekyll and Mr C (SRG Talk)
A mix of Haskell, O'Caml, and Cyclone Haskell O'Caml Jekyll contains no original language features • All features are present in either Haskell, O'Caml or Cyclone • Features are usually implemented in the same way too • Although the combination can be interesting… We will focus on the encoding, rather than the language itself Cyclone Dr Jekyll and Mr C (SRG Talk)
Generic Types Jekyll C • All extra type info is thrown away • type parameters • type variables • type constraints • The Jekyll translator restores them from the previous Jekyll file struct<%a> Node{ %a *element; List<%a> *tail; }; struct Node{ void *element; List *tail; } Dr Jekyll and Mr C (SRG Talk)
Tagged Unions Jekyll C tagged<%a> List{ Node<%a> NODE; void EMPTY; }; switch(*l){ case EMPTY: return 0; case NODE n: return len(n); }; struct List{ enum {NODE,EMPTY} _tag; union { Node NODE; void EMPTY; } _body; }; switch(l->_tag){ case EMPTY: return 0; case NODE: return len(l->_body.NODE); }; • No annotations here either • Jekyll will attempt to decode any struct that has _tag and _body fields Dr Jekyll and Mr C (SRG Talk)
Unsafe Unions Jekyll C unsafe *p++ = *q++; *p++ = *q++; • All unsafe C operations are allowed • Pointer arithmetic • Unchecked array bounds • Unsafe casts, etc etc Must be marked with the "unsafe" keyword to avoid a warning Dr Jekyll and Mr C (SRG Talk)
Lambda Expressions int plusthree(int z){return foo(3, x : x + z;);} Programmers are free to change all generated names • The fe and ft prefixes are the defaults, but are not required • They are just used to reduce incidence of name clashes struct fe_env{ int *z;}; int ff_lam(struct fe_env *_env, int x){return x+*(_env->z);} int plusthree(int z){ struct fe_env ft0 = {&z}; return foo(3,(void*)&ff_lam,&ft0); } Dr Jekyll and Mr C (SRG Talk)
Type Classes (Haskell-Style) (1/2) interface Print %a{ void print(%a *x); }; • Jekyll implements the full Haskell98 type class system • Any struct that contains only functions can be decoded as a type class • Type-classes are a good match for C code • They don't change the in-memory representation (unlike vtables) • One can add methods to existing types (unlike vtables) struct Print { void (*print)(void* _env, _va *x); }; Dr Jekyll and Mr C (SRG Talk)
Type Classes (Haskell-Style) (2/2) implement Print int { void print(int *x){print_int(*x);}; }; • Defining a new type class instance creates a new dictionary struct. implement(Print int); void int_print(void* _env,int *x){print_int(*x);}; struct Print Print_int = {(void*)&int_print}; Dr Jekyll and Mr C (SRG Talk)
Initialiser Expressions return new Node{h,t} • Safe, easy, creation of values. • One can of course rename all temporaries. List *tmp; tmp = (List*) jkl_GC_malloc(sizeof(List); tmp->_tag = Node; tmp->_body.Node.head = h; tmp->_body.Node.tail = t; return tmp; Dr Jekyll and Mr C (SRG Talk)
Other Features • Fat pointers – allow safe pointer arithmetic (like Cyclone) • Macrotype – tell Jekyll how to interpret foreign macros (like Astec) Dr Jekyll and Mr C (SRG Talk)
What is Jekyll • Jekyll & its C Encoding • Lossless Translation • Demo Dr Jekyll and Mr C (SRG Talk)
Simplified C->Jekyll Translation • Ignoring parsing, transforms, analysis, typchecking, etc etc Previous Jekyll File Select Closest Decode Output C File Non-det Jekyll File Jekyll File Dr Jekyll and Mr C (SRG Talk)
Simplified Jekyll->C Translation Previous C File Select Closest Encode Output Jekyll File Non-det C File C File • Ignoring parsing, transforms, analysis, typchecking, etc etc Dr Jekyll and Mr C (SRG Talk)
Expanded Jekyll->C Translation Previous C Tokens Guesses Select Closest Whiteflow/Check Output Jekyll Tokens Possible Jkl Tokens Possible C Tokens C Tokens Analysis Pretty Print Parse Encode Jekyll AST Non-det Combined AST Dr Jekyll and Mr C (SRG Talk)
Encode/Decode: Non-deterministic • Produce a non-deterministic AST describing all possibilities • Encode: Produce C that could implement a Jekyll feature • Decode: Look for C code that might implement a Jekyll feature • Decode is very aggressive – will even accept invalid encodings • If it seems that that might have been what was intended • User can be warned about these at check time Encode Jekyll AST Non-det C AST Decode C AST Non-det Jekyll AST Dr Jekyll and Mr C (SRG Talk)
Check: Ensure input was well formed • Decode stage will accept illegal encodings • By design: Makes converting mangled C easier Check that our output be translated back to our input? • If not, then warn the user to look at the diffs Check C Tokens Possible Tokens Dr Jekyll and Mr C (SRG Talk)
Degrees of Conformity Cannot Translate Translates but check fails Translates and check passes Translates and is canonical All is good Encoding stays as C Best match is a decoded feature But encoding was invalid Generate a file, but warn Dr Jekyll and Mr C (SRG Talk)
Select Closest: Resolve Non-Determinism Previous File • Chose encoding so as to minimise the textual differencefrom the previous file • If AST did not change, new file will bebit-for-bit identical to old file • Now: Line-by-line comparison • Minimises differences as seen by "diff" • Future: Burrows-Wheeler longest common substring Select Closest Non-det File Dr Jekyll and Mr C (SRG Talk)
Twinned Token Printing Previous C • Carry whitespace and comments between Jekyll and C • Otherwise language comments would be entirely disconnected Whitespace can come from input file or previous file • Twinned token: Whitespace from input token that matches the twin • Untwinned token: Whitespace from previous file version Printed C Jekyll AST Twins Printed Jekyll Input Jekyll Dr Jekyll and Mr C (SRG Talk)
What is Jekyll • Jekyll & its C Encoding • Lossless Translation • Demo Dr Jekyll and Mr C (SRG Talk)
Demo Dr Jekyll and Mr C (SRG Talk)
Conclusions • Jekyll is a powerful functional programming language • Lossless translation makes it practical to migrate C code • Non-Deterministic encoding makes it tolerant of C edits • Download Jekyll now: • http://jekyllc.sf.net Dr Jekyll and Mr C (SRG Talk)