120 likes | 221 Views
Parallelizing the GAP Kernel. Reimer Behrends University of St. Andrews. The GAP Kernel. 170,000 lines of sequential C code. Hundreds of global and static variables. Custom generational garbage collector. Goal: Allow multi-threaded execution. Multiple Interpreter Instances.
E N D
Parallelizingthe GAP Kernel Reimer Behrends University of St. Andrews
The GAP Kernel • 170,000 lines of sequential C code. • Hundreds of global and static variables. • Custom generational garbage collector. • Goal: Allow multi-threaded execution.
Multiple Interpreter Instances • Interpreter state stored in global variables. • Objectify interpreter state? – or – • Use thread-local storage?
Objectify Interpreter State • Global variable use is pervasive. • Vast majority of functions/macros • Need access to state themselves or • Have to pass it to functions they call. • Function tables. • Too invasive for the code base overall.
Thread-Local Storage • No portable solution. • Only some systems support a TLS ABI. • __thread in gcc, .tls storage segment • pthread_getspecific() portable, but slow. • Use: SP/FP-relative addressing. • Thread stack is allocated on power-of-2 boundaries. • Mask lower bits to derive base of stack area. • pthread_setstack(), alloca().
Garbage Collection • Current “gasman” collector: • Difficult to adapt to multi-threaded environment. • Serialization bottlenecks (CHANGED_BAG). • Interim solution: BDW conservative collector. • Has thread support. • Largely plug-and-play. • Adaptation uses gasman API. • However: Problems with the 64-bit version. • Need finalization.
Synchronization • Programming model still “under construction”. • Build a set of basic thread manipulation and synchronization primitives.
Thread Management • Thread management primitives: • id := CreateThread(func, arg1, …, argn); • WaitThread(id); • Example: x := a; id := CreateThread(function(y) x := x + y; end, b); WaitThread(id);
Channels • Channels are FIFO queues • SendChannel(channel, object); • object := ReceiveChannel(channel); • Blocking and polling versions. • Both bounded and unbounded channel size. • Multiplexing: • object := ReceiveAnyChannel(ch1, …, chn);
Barriers • StartBarrier(barrier, count); • WaitBarrier(barrier); • WaitBarrier(barrier, function);
Single Assignment Variables • WriteSyncVar(var, value); • Only one write permitted. • Subsequent writes result in errors. • value := ReadSyncVar(var); • Blocks if ‘var’ has not been written yet.
Build Process • HPC GAP internal builds use SCons. • Automatic and clean dependency tracking for C. • Proper rebuilds for changes in build setup. • E.g., scons gmp=no. • Python easier to write than m4+/bin/sh+make.