170 likes | 522 Views
OpenCL Ch. 5~7. Jongeun Lee. Fall 2013. Ch. 5: OpenCL C built-in functions. work-item functions math functions integer functions common functions geometric functions relational functions synchronization functions async copy and prefetch functions
E N D
OpenCLCh. 5~7 Jongeun Lee Fall 2013
Ch. 5: OpenCL C built-in functions • work-item functions • math functions • integer functions • common functions • geometric functions • relational functions • synchronization functions • async copy and prefetch functions • vector data load and store functions • atomic functions • miscellaneous vector functions • image functions
Work-item functions • example (1-D) • global work size = 16 items • work-group size = 8 items/group • note: • mapping from global & local IDs to work-items: implementation dependent
cl_intclEnqueueNDRangeKernel( cl_command_queuecommand_queue, cl_kernel kernel, cl_uintwork_dim, constsize_t *global_work_offset, constsize_t *global_work_size, constsize_t *local_work_size, cl_uintnum_events_in_wait_list, constcl_event *event_wait_list, cl_event *event) • Note: Context associated with events in event_wait_list and command_queue must be the same
Functions • uintget_work_dim() • work_dim • size_tget_global_size(uintdimindx) • size_tget_global_id(uintdimindx) • size_tget_local_size(uintdimindx) • size_tget_local_id(uintdimindx) • local_work_size • size_tget_num_groups(uintdimindx) • size_tget_group_id(uintdimindx) • size_tget_global_offset(uintdimindx) • The above is useful only to kernels enqueued with clEnqueueNDRangeKernel function
Synchronization functions • void barrier(cl_mem_fence_flags flags) • may be in a conditional/loop • also queues a memory fence (reads and writes) to ensure correct ordering of memory operation to local/global • flags: CLK_LOCAL_MEM_FENCE, CLK_GLOBAL_MEM_FENCE, or both
memory consistency is enforced only between work-items in a work-group, not across work-groups • example • global work size: 16 items • work-group size: 8 items/group • id[7] & id[8] undetermined!
Async copy & prefetch functions • event_tasync_work_group_copy • (local T* dst, const global T* src, size_t n, event_t event) • the other way too (dst <-> src) • must be encountered by all work-items in a work-group • if event != 0, return the supplied event object event shared by async copies • event_tasync_work_group_strided_copy • with additional parameter: size_t stride • stride is for the global address space • void wait_group_events(intnum_events, event_t *event_list) • must be encountered by all work-items in a work-group • void prefetch(const global T* p, size_t n) • prefetch into global cache • no effect on functional behavior of kernel
Ch. 7: Buffers & sub-buffers • memory objects • buffer: 1-D arrays of bytes • sub-buffer: 1-D view into buffer • image: 2-D or 3-D data structured array • memory objects are • allocated against a context (which may have multiple devices) • globally visible to all devices within the context • writes to memory object may not be visible by a following read • read/written by enqueuing commands to a particular device • may be blocking/non-blocking
Creating (sub-)buffers • cl_memclCreateBuffer(…, cl_mem_flags flags, size_t size, void* host_ptr, …) • flags (bit-field) • CL_MEM_READ_WRITE • CL_MEM_WRITE_ONLY • CL_MEM_READ_ONLY • CL_MEM_USE_HOST_PTR • CL_MEM_ALLOC_HOST_PTR • CL_MEM_COPY_HOST_PTR • use: use the host_ptr as buffer • alloc: allocate buffer in host-accessible memory • copy: allocate and copy from the host_ptr • alloc| copy: alloc in host-accessible memory and copy
Read/write/copy buffers • clEnqueueWriteBuffer • host memory -> buffer • clEnqueueReadBuffer • buffer -> host memory • clEnqueueCopyBuffer • buffer1 -> buffer2 • also can copy rectangular segments of a buffer using …Rectversions
Map/unmap buffers • void* clEnqueueMapBuffer • maps a region of buffer into host memory • returns host pointer, which can be passed to libraries (& other functions) • clEnqueueUnmapMemObject • when buffer mapping is no longer needed
Ch. 6: Programs & Kernels • program vs. kernel objects? • arguments can be passed to kernel • cl_intclSetKernelArg(cl_kernel kernel, cl_unitarg_index, size_targ_size, const void *arg_value)