310 likes | 451 Views
Introduction to CBEA SDK. Veselin Dikov Moscow-Bavarian Joint Advanced Student School 19-29. March 2006 Moscow, Russia. Outline. Getting started SPU Language Extensions SPE Library SDK Libraries Remote Procedure Calls. Getting started. Two executables formats: ppu vs. spu
E N D
Introduction to CBEA SDK Veselin Dikov Moscow-Bavarian Joint Advanced Student School19-29. March 2006 Moscow, Russia
Outline • Getting started • SPU Language Extensions • SPE Library • SDK Libraries • Remote Procedure Calls
Getting started • Two executables formats: ppu vs. spu • Two different compilers: ppu-gcc and spu-gcc • Check the format of an executable • # file <executable> • Makefile headers • Reside in SDK root directory • make.header • make.footer • make.env
Getting started • Makefile - spu • Makefile - ppu # Subdirectories############### DIRS := spu # Target####################### PROGRAM_ppu:= simple # Local Defines################ IMPORTS:= spu/lib_simple_spu.a \ -lspe # imports the embedded simple_spu # libraryallows consolidation of # spu program into ppe binary # make.footer ################## include../make.footer # make.footer is in the top of # the SDK # Target####################### PROGRAMS_spu := simple_spu # created embedded library LIBRARY_embed:= lib_simple_spu.a # Local Defines ################################ IMPORTS = $(SDKLIB_spu)/libc.a # make.footer################################ include ../../make.footer # make.footer is in the top of # the SDK
Outline • Getting started • SPU Language Extensions • SPE Library • SDK Libraries • Remote Procedure Calls
SPU Language Extensions • Provides SPU functionality • Extended RISC instruction set • Extended Data types: vector • C/C++ intrinsic routines • Memory Flow Control (MFC)
SIMD Vectorization • vector - Reserved as a C/C++ keyword • 128 bits of size • 16 bytes, or • 4 32-bit words, or • etc. • Single instruction acts on the entire vector • vector unsigned int vec = • (vector unsigned int)(1,2,3,4); • vector unsigned int v_ones = • (vector unsigned int)(1); • vector unsigned int vdest = • spu_add(vec, v_ones);
SIMD Vectorization • Overridden C/C++ operators • sizeof() (=16 for any vector) • operator= • operator& • Various intrinsic functions for vectors • Arithmetical – spu_add, spu_mult, etc. • Logical – spu_and, spu_nor, etc. • Comparison – spu_cmpeq • Scalar – spu_extract, spu_insert, spu_promote • Etc.
MFC routines • Local Store (LS) vs. Effective Address (EA) • Data transport via DMA • Up to 16,384 bytes on DMA message • Asynchronous • Part of the instruction set, C/C++ intrinsics • mfc_get(&data,addr+16384*i,16384,20,0,0); • What is __attribute__ ((aligned (128)));? LS address EA address
MFC example • ppu main file int main() { … /* the array needs to be aligned on a 128-byte cache line */ data = (int *) malloc(127 + DATASIZE*sizeof(int)); while (((int) data) & 0x7f) ++data; … spe_create_thread(gid,&sample_spu,data,NULL,-1,0);
MFC example • ppu main file int main() { … /* the array needs to be aligned on a 128-byte cache line */ data = (int *)malloc_align(DATASIZE*sizeof(int),7); … spe_create_thread(gid,&sample_spu,data,NULL,-1,0);
MFC example • spu main file int databuf [DATASIZE]__attribute__ ((aligned (128))); int main(void * arg) { int *data = &databuf[0]; … mfc_get(data,arg,DATASIZE*sizeof(int),20,0,0); mfc_write_tag_mask(1<<20); mfc_read_tag_status_all();
Outline • Getting started • SPU Language Extensions • SPE Library • SDK Libraries • Remote Procedure Calls
Provides PPU functionality • Two sets of functions • Thread management – POSIX like • MFC access functions – access to mailboxes • Header: <libspe.h>
Thread Group 0 Thread Group 1 Th0 Th0 Th1 Th1 Th2 Th2 ThN ThN Thread management • Threads are organized in thread groups • Thread scheduling: • SCHED_RR • SCHED_FIFO • SCHED_OTHER • Priority: • RR and FIFO – 1 to 99 • OTHER – 0 to 99
Thread management • Threads are organized in thread groups spe_gid_t gid; // group handle speid_t speids[1]; // thread handle // Create an SPE group gid = spe_create_group (SCHED_OTHER, 0, 1); if (gid == NULL) exit(-1); // failed // allocate the SPE task speids[0] = spe_create_thread(gid,&sample_spu,0,0,-1,0); if (speids[0] == NULL) exit (-1); // wait for the single SPE to complete spe_wait(speids[0], &status[0], 0);
Thread management • spe_get_affinity, spe_set_affinity • 8-bit mask specifying SPE units where to start the thread • spe_get_ls • Gets access to SPU’s local store • Dirty!! • Not supported by all operating systems • Used for RPC communication • spe_get/set_context, spe_get_ps
Thread management • Two ways of communicating with threads • via signals • via mailboxes • POSIX-like Signals • spe_get_event - check for signals from a thread • spe_kill – send signal to a thread • spe_write_signal – writes a signal through MFC • MFC mailboxes • spe_read_out_mbox, spe_write_in_mbox • spe_stat_in_mbox, spe_stat_out_mbox, spe_stat_out_intr_mbox
Outline • Getting started • SPU Language Extensions • SPE Library • SDK Libraries • Remote Procedure Calls
SDK Libraries • The SDK comes with various applied libraries
SDK Libraries • The SDK comes with various applied libraries
Example typedef union { vector float fv; vector unsigned int uiv; unsigned int ui[4]; float f[4]; } v128; typedef vector float floatvec; … v128 A[4] = {(floatvec)(0.9501,0.8913,0.8214,0.9218), (floatvec)(0.2311,0.7621,0.4447,0.7382), (floatvec)(0.6068,0.4565,0.6154,0.1763), (floatvec)(0.4860,0.0185,0.7919,0.4057)}; v128 invA[4]; v128 x, y; x.fv = (floatvec)(1, 5, 6, 7); y.fv = (floatvec)(0); inverse_matrix4x4(&invA[0].fv, &A[0].fv); madd_vector_matrix(4,4,&invA[0].fv,4,(float *)&x,(float *)&y); • Solving from LargeMatrix library. Solves: y = A*x + y from Matrix library
Outline • Getting started • SPU Language Extensions • SPE Library • SDK Libraries • Remote Procedure Calls
Remote Procedure Calls • Function-Offload Model • SPE threads as services • PPE communicates with them thought RPC calls • Remote Procedure Calls • stubs
Remote Procedure Calls • Preparing stubs • Interface Description Language (IDL) • idl files • interface add • { • import "../stub.h"; • const int ARRAY_SIZE = 1000; • [sync] idl_id_t do_inv ([in] int array_size, • [in, size_is(array_size)] int array_a[], • [out, size_is(array_size)] int array_res[]); • … • } • idl compiler • # idl -p ppe_sample.c -s spe_sample.c sample.idl
Remote Procedure Calls • Preparing stubs
Example • .idl file interface add { import "../stub.h"; const int ARRAY_SIZE = 1000; [sync] idl_id_t do_add ([in] int array_size, [in, size_is(array_size)] int array_a[], [in, size_is(array_size)] int array_b[], [out, size_is(array_size)] int array_c[]); [sync] idl_id_t do_sub ([in] int array_size, [in, size_is(array_size)] int array_a[], [in, size_is(array_size)] int array_b[], [out, size_is(array_size)] int array_c[]); }
Example • spe do_add.c file (user file) #include "../stub.h" idl_id_t do_add ( int array_size,int array_a[], int array_b[], int array_c[]) { int i; for (i = 0; i < array_size; i++) { array_c[i] = array_a[i] + array_b[i]; } return 0; } idl_id_t do_sub ( int array_size, int array_a[], int array_b[], int array_c[]) { ... return 0; }
Example • ppe main.c file (user file) #include "../stub.h" int array_a[ARRAY_SIZE] __attribute ((aligned(128))); int array_b[ARRAY_SIZE] __attribute ((aligned(128))); int array_add[ARRAY_SIZE] __attribute ((aligned(128))); int main() { ... do_add (ARRAY_SIZE, array_a, array_b, array_add); ... }
Conclusion • Powerful SDK • vector data type • Extensive Matrix routines • Extensive Signal Processing routines • Game-industry driven • But applicable for scientific work as well
References • CBEA-Tutorial.pdf, SDK documentation • idl.pdf, SDK documentation • libraries_SDK.pdf, SDK Documentation • libspe_v1.0.pdf, SDK Documentation • SPU_language_extensions_v21.pdf • Sony online resources • http://cell.scei.co.jp/pdf/SPU_language_extensions_v21.pdf, 15.03.2006 Thank you for your attention!