270 likes | 498 Views
TEXTURE MEMORY IN - IN CUDA PERSPECTIVE. Texture Memory. -in CUDA Perspective. VINAY MANCHIRAJU. TEXTURE MEMORY. Read only memory used by programs in CUDA Used in General Purpose Computing for Accuracy and Efficiency.
E N D
TEXTURE MEMORY IN - IN CUDA PERSPECTIVE Texture Memory -in CUDA Perspective VINAY MANCHIRAJU
TEXTURE MEMORY • Read only memory used by programs in CUDA • Used in General Purpose Computing for Accuracy and Efficiency. • Designed for DirectX and OpenGL rendering Pipelines.
WHY USE TEXTURES? • Can cache non consecutive memory locations unlike CPU caching schemes. • Designed to accelerate access patterns.
Texture memory is cached on a chip. • Provides higher effective bandwidth. • Reduces memory requests to the off-chip DRAM. • Improves performance of graphics application where memory access patterns exhibit great deal of spatial locality.
PARALLELIZING PHYSICAL SIMULATIONS • Results are more accurate with reduced computational complexity and lesser time to solve. • Textures have a significant role in simulation problems.
HEAT SIMULATION EXAMPLE • A rectangular room consisting of a grid. • Inside the grid various heaters with fixed temperatures are scattered in the cell .
FLOW OF HEAT Warmer cells tend to cool as the heat is dissipated to cooler regions and vice versa
AS A FUNCTION OF HEAT LOSS/GAIN • Imagine that there are 4 neighbors for a given cell. • K -> Rate of heat flow from one cell to another. • A large value of k will drive the system to a constant temperature quickly, while a small value will allow the solution to retain large temperature gradients longer.
THREE STEPS TO COMPUTE TEMPERATURE UPDATES • copy_const_kernel() • Copy Heater temperatures to respective grids • Enforce a restriction that temperatures of the cells with heaters are constant. • blend_kernel(): • Output temperatures are calculated based on the input temperatures of the grid using the equation. • Swap the input and output temperatures for the calculation in next step.
copy_const_kernel() • Convert threadIdx and blockIdx into an x and y coordinate. • Compute a linear offset into constant and input buffers. • If the cell in the constant grid is nonzero copy of the heater temperature in cptr[] to the input grid in iptr[] .
blend_kernel() • 1 thread for every cell. • Offsets of the neighbors in all the 4 directions are computed to read the temperatures of those cells. • Each thread reads its cell’s temperature, the temperatures of its neighboring cells, perform the previous update computation, and then update its temperature with the new value. • Calculate updated temperature adding old temperatures and scaled differences and the neighboring cell temperatures.
anim_kernel() • We use DataBlockcontains the constant buffer of heaters and the updated temperatures. • Arguments: pointer to a data block, number of ticks of animation that have elapsed.(not used) • We use a 16 x 16 grid and blocks of 256 threads.
anim_kernel() • After the iteration we swap the input and output buffers to obtain the final temperatures. • The temperatures are converted into colors and the bitmap image is transferred from GPU to CPU. • The Program.
USING TEXTURES • Declare inputs as texture references. • Use references to floating point textures . • Allocate GPU memory for these textures and then bind the references using cudaBindTexture()
cudaBindTexture() • Use specified buffer as a texture and texture reference as texture name. • Please check cudaBindTexture()
tex1Dfetch() • A Compiler intrinsic function. • Used to pass texIn, texOut, texConstSrc textures to the blend method. • This would help us to fetch the texture value into a float point variable.
USING 2D-TEXTURES • Reference Declaration: • Instead of using offset to calculate left, right, top and bottom we directly use x,y to access the texture.
USING 2-D TEXTURES • Bounds overflow over the grid is taken care of. • If one of x or y is less than zero, tex2D() will return the value at zero. • If one of these values is greater than the width, tex2D() will return the value at width 1.
Tradeoffs 1D vs 2D • So from a performance standpoint, the decision between one- and two-dimensional textures is likely to be inconsequential. • For our particular application, the code is a little simpler when using two- dimensional textures because we happen to be simulating a two-dimensional domain. But in general, since this is not always the case, we suggest you make the decision between one- and two-dimensional textures on a case-by-case basis.
REFERENCES • http://http.developer.nvidia.com/Cg/tex1Dfetch.html • http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__HIGHLEVEL_g2aeb95eab6b9d90bb00b26406a27c515.html • http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__HIGHLEVEL_g67660ae3e9a1ff520575394f78087bea.html#g67660ae3e9a1ff520575394f78087bea