GC16/3011 Functional Programming Lecture 22 The Four-Stroke Reduction Engine

GC16/3011 Functional ProgrammingLecture 22The Four-Stroke Reduction Engine

Contents • Motivation • Model for Parallel Graph Reduction • Parallelism and Tasks • FSRE representation, synchronisation and scheduling • Two-stroke reduction • Four-stroke reduction • Summary

Motivation • Previously: abstract/theoretical • This lecture: a real graph reducer • Details of the Four-Stroke Reduction Engine

Model for PGR Agent Agent Agent Agent Agent Shared Heap Shared Task Pool @ @ 4 3 +

Each task: • has access to any part of graph • performs reductions in normal order • reduces a subgraph to (weak head) normal form • Overwrites root node of redex (with indirection to result) as indivisible operation • Then simply “dies” • may anticipate need for value of a subgraph • Places task for that subgraph in task pool (sparking) • is executed by an agent (physical processor)

Parallelism and tasks • Sparking could be conservative or speculative • Speculative sparking needs careful management • FSRE uses conservative sparking • For (e1 e2), e1 may not yet be evaluated • So could evaluate e1 in parallel with e2 • Extends to many arguments evaluated in parallel • But only those we know will be needed • Parallelism annotations advise when and what to spark

Want to detect parallelism in three cases: • f x y = x + y • f will always evaluate x and y • Could annotate the function f, or the application nodes ((f x) y) • ((if e1 f g) e2) • Don’t know which function used until runtime • So annotate the functions • f x y = y 3 x • f is not strict in x if y doesn’t use x • But for application (f e +) the expression e WILL be used • So annotate the application nodes

FSRE representation • A node (or cell) has a tag, a left field and a right field • Tags denote application, lambda, constant, parallelism annotations and “paint” (see later) etc. • A “task” is two pointers (B and F) • Graph traversal is achieved using pointer reversal (no stack required) • Current state of a suspended task is held in graph • Reversed pointers made inaccessible to other tasks (because nodes are “painted” – see later)

FSRE synchronisation • Two tasks attempt to evaluate common subgraph? • Mutual exclusion not required, but desirable to prevent duplicated effort @ * @ * @ @ 3 + 1 @ 6 g

FSRE synchronisation (2) • As task traverses graph, it “paints” all nodes it is working on (special versions of tags) • After working on a section of graph, it “unpaints” the nodes • If a task attempts to access a node that has been “painted” by another task, it blocks until the node is unpainted • Tasks are blocked and later resumed with no explicit communication between agents or tasks

FSRE synchronisation (3) • A task (parent) sparks a subtask (child) to evaluate a subgraph • Later, the parent accesses the subgraph to get its value. The subgraph might be in one of three states: • Already evaluated: parent uses value • Being evaluated: subgraph is “painted” and parent blocks until it is “unpainted” • Not yet started to be evaluated: parent evaluates the subgraph (“paints” the nodes) and child will later block or die

FSRE synchronisation (4) • A task is blocked when it accesses a “painted” node: • It is then placed on a queue of blocked tasks • This queue is attached to the node that caused the block • Using reversed pointer so no extra memory overhead! • When the node is “unpainted”, all tasks in the task queue for this node are sent to the task pool • Block on unwind, resume on rewind

@ @ @ @ @ B’ F’ B F

Q @ Q Q @ @ @ @ B’ F’ B F

FSRE scheduling • Too many sparked tasks: task pool fills up • Ignore new sparked tasks! • Discard already-sparked tasks! • (parents always check on their children and do the work themselves if child doesn’t) • NB can’t ignore/discard RESUMED tasks (parent?) • Always schedule resumed tasks first • Use LIFO/FIFO switching for parallelism control (less/more) in system

Two-stroke reduction • “Inlet” • Unwind down the spine to find the leftmost outermost function • Use pointer-reversal and “paint” nodes • If find parallelism annotations in application nodes, spark tasks to evaluate those arguments • Might block on way down, so don’t remember arguments • If leftmost outermost function is a lambda (or a primitive with no strict args), use 2-stroke reduction • if primitive operator, use 4-stroke reduction • “Exhaust” • Get parallelism info and number of args • Rewind (& unpaint) up the spine to find the root of the redex Overwrite root with IND to result of reduction • Then go to “Inlet” again!

Four-stroke reduction • “Inlet” – same as before • “Compression” • Get parallelism info and number of strict args • Rewind (& unpaint) up the spine to the topmost strict argument, sparking strict args on the way up • “Power” • Unwind (& paint) the spine again, checking the evaluation of all strict args one at a time • “Exhaust” – same as before

Summary • Motivation • Model for Parallel Graph Reduction • Parallelism and Tasks • FSRE representation, synchronisation and scheduling • Two-stroke reduction • Four-stroke reduction • Summary

GC16/3011 Functional Programming Lecture 22 The Four-Stroke Reduction Engine