130 likes | 268 Views
Expressing Pipeline Parallelism Using TBB Constructs. A Case Study on What Works and What Doesn‘t Eric C. Reed Nicholas Chen Ralph E. Johnson. Motivation. Goal: Identify core programming patterns used in pipeline parallelism Convert “pipeline- ish ” serial programs to parallel ones
E N D
Expressing Pipeline Parallelism Using TBB Constructs A Case Study on What Works and What Doesn‘t Eric C. Reed Nicholas Chen Ralph E. Johnson
Motivation • Goal: Identify core programming patterns used in pipeline parallelism • Convert “pipeline-ish” serial programs to parallel ones • Identifying transformations could lead to automation • PARSEC & TBB pipelines • REU project focused on just part of the bigger picture • Always some “pre-transformation” needed before TBB could be used • TBB performed on par with or better than pthreads making library/framework based approaches attractive • TBB Flow Graph had not yet been released • Resolves some problems we found • Our work provides empirical evidence for needing more complex constructs than available in TBB pipelines
ferret : Content-based Image Search • Read in image • Break image into segments • Extract feature vectors from segments • Query database with feature vectors to find candidate images • Rank candidate images based on similarity • Output best-matching images
TBB filter • A single stage of the pipeline • Represented as a function object • Input: void* to output of previous stage • Output: void* to input of next stage • First/Last stage generates/consumes tokens • Serial-in-order, serial-out-of-order, or parallel class foo : tbb::filter { void* operator()(void* inp) { … operate on token … }; };
TBB pipeline • A pipeline is a sequence of filters • Specified max number of live tokens • Calls first stage to get a new token • A NULL pointer signifies no more input tbb::pipeline pipe; pipe.add_filter(new ReadFilter()); pipe.add_filter(new DoFilter()); pipe.add_filter(new WriteFilter()); pipe.run( 10 ); pipe.clear();
ferret : Content-based Image Search • Read in image (serial-in-order) • Break image into segments (parallel) • Extract feature vectors from segments (parallel) • Query database with feature vectors to find candidate images (parallel) • Rank candidate images by similarity (parallel) • Output best-matching images (serial-out-of-order)
x264: H.264 Video Encoding • Frame contents predicted from already encoded reference frames • Frame processing cannot start until all reference frames are encoded • Cannot be guaranteed by TBB without blocking • TBB pipelines are not a suitable representation
dedup: File (de)compression • Write a file segment once and its hash every other time • Read in a block of the file (serial-in-order) • Split block into small segments (parallel) • Hash the segment and check database (parallel) • If hash found in database go to step 5 • Otherwise go to step 4 • Compress the segment’s data (parallel) • Reorder segments into a block. Reorder blocks and write out data (serial-in-order) • Token generating stage (step 2) • Optional stage (step 4)
dedup: File (de)compression • Read in a block from file (serial-in-order) • Do the following on the block (parallel) • Split block into segments (serial-in-order) • Compute and check hash (parallel) • Compress segment (parallel) • Check flag to either compress data or immediately return • Reorder segments into block (serial-in-order) • TBB handles reordering so we need only append the segment to the block data structure • Write out block (serial-in-order) • TBB handles reordering so we can just write out the block data
Summary • Transformations • Recursive generators become iterators with stacks • Semi-automation with user identifying state • Optional stages become required stages with flags • Semi-automation with user identifying conditions • Token generating stages require nested pipelines • Semi-automation with user specifying how to convert between pipelines • TBB pipeline unsuitability • Dynamically constructed pipeline • Waiting on earlier tokens to finish first