230 likes | 414 Views
Stackless Python: programming the way Guido prevented it intended . Back To IPC9 developer‘s day. Why Stackless is Cool. Microthreads Generators (now obsolete) Coroutines. Microthreads. Very lightweight (can support thousands) Locks need not be OS resources
E N D
Stackless Python: programming the way Guido prevented it intended
Why Stackless is Cool • Microthreads • Generators (now obsolete) • Coroutines
Microthreads • Very lightweight (can support thousands) • Locks need not be OS resources • Not for blocking I/O • A comfortable model for people used to real threads
Coroutines Various ways to look at them • Peer to peer subroutines • Threads with voluntary swapping • Generators on steroids (args in, args out) What’s so cool about them • Both sides get to “drive” • Often can replace a state machine with something more intuitive[1] [1] Especially where the state machine features complex state but relatively simple events (or few events per state).
Three Steps To Stacklessness • Get Python data off the C stack • Give each frame its own (Python) stackspace • Get rid of interpreter recursions Result • All frames are created equal • Stack overflows become memory errors • Pickling program state becomes conceivable (new: *has* been done)
Getting rid of recursion is difficult • Often there is “post” processing involved • The C code (doing the recursing) may need its own “frame” • Possible Approaches • Tail optimized recursion • Transformation to loop Either way, the “post” code needs to be separated from the “setup” code. Ironic Note: This is exactly the kind of pain we seek to relieve the Python programmer of!
Stackless Reincarnate • Completely different approach: • Nearly no changes to the Python core • Platform dependant • Few lines of assembly • No longer fighting the Python implementation • Orthogonal concepts
Platform Specific Code __forceinline static int slp_switch(void) { int *stackref, stsizediff; __asm mov stackref, esp; SLP_SAVE_STATE(stackref, stsizediff); __asm { mov eax, stsizediff add esp, eax add ebp, eax } SLP_RESTORE_STATE(); } Note: There are no arguments, in order to simplify the code
Support Macros 1(2) #define SLP_SAVE_STATE(stackref, stsizediff) \ {\ PyThreadState *tstate = PyThreadState_GET();\ PyCStackObject **cstprev = tstate->slp_state.tmp.cstprev;\ PyCStackObject *cst = tstate->slp_state.tmp.cst;\ int stsizeb;\ if (cstprev != NULL) {\ if (slp_cstack_new(cstprev, stackref) == NULL) return -1;\ stsizeb = (*cstprev)->ob_size * sizeof(int*);\ memcpy((*cstprev)->stack, (*cstprev)->startaddr - (*cstprev)->ob_size, stsizeb);\ (*cstprev)->frame = tstate->slp_state.tmp.fprev;\ }\ else\ stsizeb = (cst->startaddr - stackref) * sizeof(int*);\ if (cst == NULL) return 0;\ stsizediff = stsizeb - (cst->ob_size * sizeof(int*));\ Note: Arguments are passed via Threadstate for easy implementation
Support Macros 2(2) #define SLP_RESTORE_STATE() \ tstate = PyThreadState_GET();\ cst = tstate->slp_state.tmp.cst;\ if (cst != NULL)\ memcpy(cst->startaddr - cst->ob_size, &cst->stack, (cst->ob_size) * sizeof(int*));\ return 0;\ }\
Stacklessness via Stack Slicing • Pieces of the C stack are captured • Recursion limited by heap memory only • Stack pieces attached to frame objects • „One-shot continuation“
Tasklets • Tasklets are the building blocks • Tasklets can be switched • They behave like tiny threads • They communicate via channels
Tasklet Creation # a function that takes a channel as argument def simplefunc(chan): chan.receive() # a factory for some tasklets def simpletest(func, n): c = stackless.channel() gen = stackless.taskoutlet(func) for i in range(n): gen(c).run() return c
Inside Tasklet Creation • Create frame „before call“ • Abuse of generator flag • Use „initial stub“ as a blueprint • slp_cstack_clone() • Parameterize with a frame object • Wrap into a tasklet object • Ready to run
Channels • Known from OCCAM, Limbo, Alef • Channel.send(x) • activates a waiting tasklet with data • Blocks if none is waiting • y = Channel.receive() • Activates a waiting tasklet, returns data • Blocks if none is listening
Planned Extensions • Async I/O in a platform independent way • Prioritized scheduling • High speed tasklets with extra stacks • Quick monitors which run between tasklets • Stack compression • Thread pickling • More channel features • Multiple wait on channel arrays
Thread pickling • Has been implemented by TwinSun • Unfortunately for old Stackless • Analysis of the C stack necessary • By platform, only • Lots of work? • Only a few contexts need stack analysis • Show it !!!
Stackless Sponsors • Ironport • Email server with dramatic throughput • Integrating their code with the new Stackless • Async I/O • CCPGames • Massive Multiplayer Online Game EVE • Porting their client code to new Stackless next week