410 likes | 593 Views
AMD-SPL Runtime Programming Guide. Jiawei. Outline . What is AMD-SPL runtime How to use AMD-SPL runtime. What is AMD-SPL Runtime. The Core of SPL . Goal. What is in SPL Runtime. Outline . What is AMD-SPL runtime How to use AMD-SPL runtime. How to use SPL Runtime. Pre-Requirements.
E N D
Outline • What is AMD-SPL runtime • How to use AMD-SPL runtime
Outline • What is AMD-SPL runtime • How to use AMD-SPL runtime
Add Include Directories • Add include path in VS2005 • CAL: “$(CALROOT)\include\” • SPL: “$(SPLROOT)\include\” • Runtime: “$(SPLROOT)\include\core\cal” Note: $(SPLROOT) is the root folder of SPL
Add Library Directories • Add library directories in VS2005 • CAL: • “$(CALROOT)\lib\lh32\” Vista 32bit • “$(CALROOT)\lib\lh64\” Vista 64bit • “$(CALROOT)\lib\xp32\” XP 32bit • “$(CALROOT)\lib\xp64\” XP 64bit • SPL • “$(SPLROOT)\lib Note: $(SPLROOT) is the root folder of SPL
Add Library Dependencies • Add additional dependencies in VS2005 • CAL: • aticalrt.lib aticalcl.lib • SPL: • amd-spl_d.lib Debug version • amd-spl.lib Release version
Header and Namespaces • Include proper header files • #include “cal.h” CAL header • #include “amdspl.h” SPL header • #include “RuntimeDefs.h” Runtime header • Using namespaces • using namespace amdspl; • using namespace amdspl::core::cal;
IL Kernel Sample il_ps_2_0 dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) dcl_input_position_interp(linear_noperspective) v0.xy__ dcl_cb cb0[1] sample_resource(0)_sampler(0) r1, v0.xy00 add o0, r1, cb0[0] endmain end The Brook+ kernel equivalent: kernel void k(out float o<>, float i<>, float c) { o = i + c; }
IL Source String const char * __sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n";
Kernel Information • Define the kernel using template class ProgramInfo • Kernel Parameters • ID of the Kernel • Source of the Kernel template <intoutputsT, intinputsT = 0, intconstantsT = 0, boolglobalsT = false> class ProgramInfo { ProgramInfo(const char* ID, const char* source) {...} ... };
Define the IL Kernel in SPL • Define a global object for the kernel typedefProgramInfo<1, 1, 1, false> SampleProgram; SampleProgramsampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);
Initialize SPL Runtime Runtime*runtime= Runtime::getInstance(); assert(runtime); DeviceManager*devMgr = runtime->getDeviceManager(); assert(devMgr); BufferManager*bufMgr = runtime->getBufferManager(); assert(bufMgr); ProgramManager* progMgr = runtime->getProgramManager(); assert(progMgr);
Assign Device to SPL boolr; r=devMgr->assignDevice(0); assert(r);
Initialize CPU Buffer void fillBuffer(float buf[], intsize) { for (int i = 0;i < size; i++) { buf[i] = (float)i; } } float *cpuInBuf=new float[1024 * 512]; float*cpuOutBuf= newfloat[1024 * 512]; float constant= 3; fillBuffer(cpuInBuf, 1024 * 512);
Get Device • Get the default device • Get device by ID Device* device = devMgr->getDefaultDevice(); OR Device* device =devMgr->getDeviceByID(0);
Load Program • Load the program using program manager • Pass in a ProgramInfo instance Program*prog = progMgr->loadProgram(sampleProgInfo); assert(prog);
Create Buffers • Create local buffer for input • Create remote buffer for output • Get constant buffer from constant buffer pool Buffer* inBuf = bufMgr-> createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); assert(inBuf); Buffer* outBuf= bufMgr->createRemoteBuffer( CAL_FORMAT_FLOAT_1, 1024, 512); assert(outBuf); ConstBuffer* constBuf= bufMgr->getConstBuffer(1); assert(constBuf);
CPU to GPU Data Transfer • Read in CPU buffer • Set Constant boolr; r = inBuf->readData(cpuInBuf, 1024 * 512); assert(r); r = constBuf->setConstant<0>(&constant); assert(r);
Bind Buffers • Bind buffers to the program • Input, Output, Constant, Global r = prog->bindOutput(outBuf, 0); assert(r); r = prog->bindInput(inBuf, 0); assert(r); r = prog->bindConstant(constBuf,0); assert(r);
Execute Program • Define the execution domain • Run program • Check the execution event CALdomaindomain = {0, 0, 1024, 512}; Event *e = prog->run(domain); assert(e);
GPU to CPU Data Transfer • Write in CPU buffer r= outBuf->writeData(cpuOutBuf, 1024 * 512); assert(r);
Unload Program • Destroy program object • Unbind all the buffers • Call Program::unbindAllBuffers(); • Unload module from context progMgr->unloadProgram(prog);
Destroy/Release Buffers • Destroy buffers • InputBuffer, OutputBuffer • Release ConstBuffer to the pool bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf);
Shutdown Runtime • Not necessary! • Runtime will be destroy when application exits. Runtime::destroy();
The Whole Program #include "cal.h" #include "amdspl.h" #include "RuntimeDefs.h" using namespace amdspl; using namespace amdspl::core::cal; void fillBuffer(float buf[], int size) { for (int i = 0;i < size; i++) { buf[i] = (float)i; } }
The Whole Program const char *__sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n"; typedefProgramInfo<1, 1, 1, false> SampleProgram; SampleProgramsampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);
The Whole Program int main(void) { float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512); Runtime *runtime = Runtime::getInstance(); DeviceManager *devMgr = runtime->getDeviceManager(); BufferManager *bufMgr = runtime->getBufferManager(); ProgramManager* progMgr = runtime->getProgramManager(); devMgr->assignDevice(0); Device* device = devMgr->getDefaultDevice(); ..........
The Whole Program ...... Program *prog = progMgr->loadProgram(sampleProgInfo); Buffer* inBuf = bufMgr->createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); Buffer* outBuf = bufMgr->createRemoteBuffer(CAL_FORMAT_FLOAT_1, 1024, 512); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); inBuf->readData(cpuInBuf, 1024 * 512); constBuf->setConstant<0>(&constant); prog->bindOutput(outBuf, 0); prog->bindInput(inBuf, 0); prog->bindConstant(constBuf, 0); CALdomaindomain = {0, 0, 1024, 512}; Event *e = prog->run(domain); r = outBuf->writeData(cpuOutBuf, 1024 * 512); ......
The Entire Program ..... progMgr->unloadProgram(prog); bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf); Runtime::destroy(); delete [] cpuInBuf; delete [] cpuOutBuf; return 0; }