Intel Array Building Blocks

Intel Array Building Blocks By: Edward Jones

Background • Intel Ct: • Developed in 2007 • Parallel programming model for multicore chips • Exploits Single Instruction, Multiple Data (SIMD) • RapidMind • Started in 2004 • Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs) • Intel acquired RapidMind on August 19, 2009

Intel ArBB • Intel ArBB is a C++ API • Promote parallel programming • Hide intricacies hardware and vector ISA • Oriented to data-intensive mathematical computations • Built in protection • An ArBB program cannot create race conditions or deadlocks by default

What is it used for? • Bioinformatics • Engineering Design • Financial Analytics • Oil and Gas • Medical Imaging • Visual Computing • Signal and Image Processing • Science and Research • Enterprise

Extend C++ • Use standard C++ feature to create new types and operators • Constructs of ArBB • Scalar types – equivalent to primitive C++ types • Vector types – parallel collections of scalar data • Operators– Scalar and vector operators • Functions – User defined code fragments • Control flow

Scalar Types

Dense Containers • Very similar to vectors • Dynamically changes size during runtime • Operations: • Element wise scalar operations • Indexing • Reordering • Reductions • Property Access • Most operations run in parallel

Dense Containers Example void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){ c = a + b; } int main(int argc, char** argv){ #define SIZE = 1024; float a[SIZE]; float b[SIZE]; float c[SIZE]; dense<f32> va; bind (va, a, SIZE); dense<f32> vb; bind (vb, b, SIZE); dense<f32> vc; bind (va, c, SIZE); call(vecsum)(va, vb, vc); }

Element-wise and Vector-scalar Operators • All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations • This allows these operations to be done in parallel to speed up runtime. • Other operators

Collective Operators • Perform computations where output(s) depend on all of the inputs. • Example Reduction – applies an operator over an entire vector to compute a distilled value or values. add_reduce([1 0 2 -1 4]) yields 6 Scan – computes reductions on all prefixes of a collection add_iscan([1 0 2 -1 4]) yields [1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]

Other Types of Operators • Permutation Operators • These operations alter the size and order of vectors • a = shift(b, -1, value); • a = rotate(b, -1) • Facility Operators • Provides data processing features

Differences from C++ _for(i32 i=0, i<=N, i++) { _if(condition){ /* code */ /* code */ } _end_for; } _else { _while(condition){ /* code */ /* code */ } _end_if; } _end_while;

Functions • Calling ArBB functions is different from normal function calls • Form: mfc fnct = call(my_function); • Calling a function creates a closure for that function • Once created the first time it will never be created again • Allows for Currying • ‘map’ function allows the programmer to execute a function for every element in a vector

Dynamic Execution Engine • Array Building Blocks provides a dynamic execution engine which comprises three major services: • Threading Runtime • Provides a model for fine-grained model for data and task parallel threading • Memory Manager • Segregates normal C++ memory from the ArBB memory • Set of lock-free memory interfaces as a garbage collector • Just-in-time Compiler/Dynamic Engine • Constructs intermediate representation of computations, performs optimizations, and generates code.

Monte Carlo Computation of Pi

Monte Carlo Computation of PiC/C++ double computepi(){ int cnt = 0; for(int i = 0; i < NEXP; i++){ float x = float(rand()) / float(RAND_MAX); float y = float(rand()) / float(RAND_MAX); float dst = sqrtf (x*x + y*y); if (dst <= 1.0f){ cnt++; } } return 4.0 * ((double) cnt) /NEXP; } *NEXP = O(2p(n))

Monte Carlo Computation of Pi ArBB Void computepi(f64& pi) { random_generator rng; dense<f32> x = rng.randomize(NEXP); dense<f32> y = rng.randomize(NEXP); dense<f32> dist = sqrt(x*x + y*y); dense<Boolean> mask = (dist <= 1.0f); dense<i32> cnt = select(mask, 1, 0); pi = 4.0 * add_reduce(cnt) / NEXP; }

Evaluation of Monte Carlo

Intel ArBB Today • Preview Release August 25, 2011 • 1.0 beta 6 • Project retired by Intel October 2012 • Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks

Sources http://www.drdobbs.com/parallel/array-building-blocks-a-flexible-paralle/227300084 http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/02_CERN_openLab_Workshop-2010_Hans_Pabst.pdf

Intel Array Building Blocks

Intel Array Building Blocks

Presentation Transcript

The Building Blocks

Mississippi Building Blocks

Studio Building Blocks

Intel ® Threading Building Blocks

Dynamical building blocks

Delaware Building BLOCKS

Building Blocks

Building Blocks

Chemistry Building Blocks

The Building Blocks

Intel Threading Building Blocks TBB

Organic Building Blocks

Arithmetic Building Blocks

Chemical Building Blocks

The Building Blocks

custom building blocks