200 likes | 311 Views
Intel Array Building Blocks. By : Edward Jones. Background. Intel Ct: Developed in 2007 Parallel programming model for multicore chips Exploits Single Instruction, Multiple D ata (SIMD) RapidMind Started in 2004
E N D
Intel Array Building Blocks By: Edward Jones
Background • Intel Ct: • Developed in 2007 • Parallel programming model for multicore chips • Exploits Single Instruction, Multiple Data (SIMD) • RapidMind • Started in 2004 • Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs) • Intel acquired RapidMind on August 19, 2009
Intel ArBB • Intel ArBB is a C++ API • Promote parallel programming • Hide intricacies hardware and vector ISA • Oriented to data-intensive mathematical computations • Built in protection • An ArBB program cannot create race conditions or deadlocks by default
What is it used for? • Bioinformatics • Engineering Design • Financial Analytics • Oil and Gas • Medical Imaging • Visual Computing • Signal and Image Processing • Science and Research • Enterprise
Extend C++ • Use standard C++ feature to create new types and operators • Constructs of ArBB • Scalar types – equivalent to primitive C++ types • Vector types – parallel collections of scalar data • Operators– Scalar and vector operators • Functions – User defined code fragments • Control flow
Dense Containers • Very similar to vectors • Dynamically changes size during runtime • Operations: • Element wise scalar operations • Indexing • Reordering • Reductions • Property Access • Most operations run in parallel
Dense Containers Example void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){ c = a + b; } int main(int argc, char** argv){ #define SIZE = 1024; float a[SIZE]; float b[SIZE]; float c[SIZE]; dense<f32> va; bind (va, a, SIZE); dense<f32> vb; bind (vb, b, SIZE); dense<f32> vc; bind (va, c, SIZE); call(vecsum)(va, vb, vc); }
Element-wise and Vector-scalar Operators • All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations • This allows these operations to be done in parallel to speed up runtime. • Other operators
Collective Operators • Perform computations where output(s) depend on all of the inputs. • Example Reduction – applies an operator over an entire vector to compute a distilled value or values. add_reduce([1 0 2 -1 4]) yields 6 Scan – computes reductions on all prefixes of a collection add_iscan([1 0 2 -1 4]) yields [1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]
Other Types of Operators • Permutation Operators • These operations alter the size and order of vectors • a = shift(b, -1, value); • a = rotate(b, -1) • Facility Operators • Provides data processing features
Differences from C++ _for(i32 i=0, i<=N, i++) { _if(condition){ /* code */ /* code */ } _end_for; } _else { _while(condition){ /* code */ /* code */ } _end_if; } _end_while;
Functions • Calling ArBB functions is different from normal function calls • Form: mfc fnct = call(my_function); • Calling a function creates a closure for that function • Once created the first time it will never be created again • Allows for Currying • ‘map’ function allows the programmer to execute a function for every element in a vector
Dynamic Execution Engine • Array Building Blocks provides a dynamic execution engine which comprises three major services: • Threading Runtime • Provides a model for fine-grained model for data and task parallel threading • Memory Manager • Segregates normal C++ memory from the ArBB memory • Set of lock-free memory interfaces as a garbage collector • Just-in-time Compiler/Dynamic Engine • Constructs intermediate representation of computations, performs optimizations, and generates code.
Monte Carlo Computation of PiC/C++ double computepi(){ int cnt = 0; for(int i = 0; i < NEXP; i++){ float x = float(rand()) / float(RAND_MAX); float y = float(rand()) / float(RAND_MAX); float dst = sqrtf (x*x + y*y); if (dst <= 1.0f){ cnt++; } } return 4.0 * ((double) cnt) /NEXP; } *NEXP = O(2p(n))
Monte Carlo Computation of Pi ArBB Void computepi(f64& pi) { random_generator rng; dense<f32> x = rng.randomize(NEXP); dense<f32> y = rng.randomize(NEXP); dense<f32> dist = sqrt(x*x + y*y); dense<Boolean> mask = (dist <= 1.0f); dense<i32> cnt = select(mask, 1, 0); pi = 4.0 * add_reduce(cnt) / NEXP; }
Intel ArBB Today • Preview Release August 25, 2011 • 1.0 beta 6 • Project retired by Intel October 2012 • Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks
Sources http://www.drdobbs.com/parallel/array-building-blocks-a-flexible-paralle/227300084 http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/02_CERN_openLab_Workshop-2010_Hans_Pabst.pdf