Parallel Image Processing

Parallel Image Processing David Oldroyd

Outline: • Images • Basic Rendering: • Ray Casting/Ray Tracing • Rasterization • Manipulation: • Shaders • Reflection • Lighting and shadows • Antialiasing and Filtering • Graphics Pipeline • Graphics Hardware

Images • 2d pixel array • Raster format • Different styles: • Single byte • RGB(A) 24 or 32 bit

Rendering • Program Data -> Image Data • Two general methods to choose from • Rasterization • Ray Casting/Ray Tracing

Rasterization • Turn Polygons into pixels • 3D approach: Foreachpolygon p in the world: Generate triangular faces for each surface of p Foreachtriangle t: TransformEach vertex into 2D Fill each pixel of the triangle with a texture

Rasterization: Serial Performance • Naive serial performance = O(Npoly) • Performance boosters: • Backface culling • Cell and portal culling • Clipping partially seen triangles • Lower polygon count models for far out objects

Rasterization: Parallel Performance • Triangle Interpolation is easily parallelizable • Some models in front of others • Great for SIMD • Requires very high memory bandwidth • Time complexity approximately O(Npoly/P)

Ray Casting • Given a ray, find the objects which it intersects foreach pixel in Image: Generatea ray from the camera through the pixel Tracethe ray through the world foreachobject it intersects find the closest to the camera

Ray Casting: Serial Performance • Single Ray: • Finding the target object assumed to be sublinear, often O(logN) • Bounded by the number of objects in the world • E(cost(r))=T0+ptotTs+pavgOtotTh • Whole space: • Break down space into grid of cells • Find which cells the ray travels through • Check the objects in those cells for a collision • totalcost = Tp + NrE(cost(r))

Ray Casting: Parallel Performance • Good News! • Embarrassingly parallel in shared memory systems • Algorithm: • Split image into regions • Each pixel is independent • Performance: • Minimal overhead, theoretical speedup of P • 2008: 4 quad core xenons can run Quake Wars at 15-20 fps • 2006: ATI X1900 1024x1024 resolution at 12-18 fps

Ray Tracing • Extension of Ray Casting • Don’t stop at the first object you find • Optical effects: • Reflection • Refraction • Scattering • Dispersion

Ray Tracing • Can be much more lifelike than Rasterization • Poor performance • Often used in animated films • One frame can take over 15 hours to compute

Image Manipulation • Reflection • Transparency/Translucency • Shaders • Lighting and Shadows • Antialiasingand Filtering

Reflection • Ray Tracing: • Create a new ray from the point of reflection • Doubles the computation time of reflected rays • New rays depend on original rays, cannot be further parallelized • Rasterization: • True reflection: • Render the entire world where the mirror should be • If P equals the number of mirrors, Tseq=Tpar • Fake reflection: • Use Lighting to imitate reflection

Transparency and Translucency • Ray Tracing: • Find next closest object • Optionally use information from the first to modify result • Minimal performance hit • Rasterization: • Interpolate far objects first • Overlay Translucent objects on top • Adds more dependencies between near and far objects • Makes culling more difficult

Shaders • Refer to both a hardware concept and a software concept • Hardware (SIMD computation unit): • Designed for Rasterization • Can be used to trace rays • Software: • Type of filter to apply during the rendering process

Shaders • Post processing effects • Can operate on: • Pixels: • Lighting and shadows • Color variation • Antialiasing • Vertices: • Translation • Color • Textures • Geometry: • Create new objects • Tessellation

Lighting and Shadows • Angular Lighting: • Determine angle to light source(s) • Detect occluding objects • Determine net lighting value • Radiosity: • Divide world into patches • Determine view factors between patches to get light bounce • Solution can be solved iteratively • More computationally intensive • Limited to diffusion

Lighting and Shadows • Either method can be used with either rendering paradigm • Angular Lighting: • Ray tracing: O(Npixels) • Rasterization: O(Npoly) • Always pipelined with other shaders • Radiosity • O(Npatches), but naïve O(Nobjects4)

Polygon Lighting • Additional shader to apply different light values over the triangle’s surface • Several different methods • Gourau: Foreach vertex: Average each adjoining face surface normal Calculate the vertex intensity from the estimated surface normal Foreachpixel: Calculate pixel intensity linearly based on the vertex intensity • Phong: Foreachvertex: Average each adjoining face surface normal Foreachpixel: Interpolate and normalize the surface normal Calculate the surface intensity based on the surface normal

Polygon Lighting Flat Shading Gourau Shading Phong Shading

Antialiasing • Family of methods to reduce aliasing • Signal processing: • 2D FFT over final image O(N2logN) • Only allow frequencies below a certain limit O(N2) • 2D reverse FFT O(N2logN) • FFT can be done in parallel at O(N2logN/P) • Object-based antialiasing • Only antialias pixels that fall on a vertex • Minimal performance gain with high polygon count

Antialiasing • More pixels = same parallel speedup • Subpixel rendering: • 3x more pixels must be rendered • Supersampling: • Render the image at a higher resolution (often 2x) • Downsample to the desired resolution • Time complexity increases by the resolution ratio squared • Multisampling: • Only supersample specific aspects of each pixel • OpenGL only supersamples depth and stencil values • Better performance than full supersampling for similar quality

Graphics Pipeline • Step 1: pre-vertex lighting and shading • O(V) vertex shaders • Each vertex handled separately, speedup=P • Step 2: clipping • O(Npoly) remove vertices of a polygon outside the viewing area • Each Polygon handled separately, speedup=P • Step 3: Projection Transformation: • O(V) each vertex must be transformed to 2D • Each vertex handled separately, speedup=P • Step 4: Rasterization • O(Npixels) convert to raster format, determine pixel values • Most pixels handled separately, speedup approaches P • Step 5: texturing and pixel shaders • O(Npixels) apply all the other pixel shaders to the final image • Shaders are usually pixel independent, speedup approaches P

Graphics Hardware • Top end GPUs can contain up to: • 2048 Unified Shaders • 128 Texture Mapping Units • 32 Render Output Units • Multi GPU rendering • P can be several thousand • Speedup is almost always close to P

Sources • http://graphics.stanford.edu/papers/i3dkdtree/gpu-kd-i3d.pdf • https://sites.google.com/site/raytracingasaparallelproblem/ • http://www.graphics.cornell.edu/~bjw/mca.pdf • http://www.codinghorror.com/blog/2008/03/real-time-raytracing.html • http://en.wikipedia.org/wiki/Rasterization • http://www.dtic.mil/dtic/tr/fulltext/u2/a236590.pdf • http://www.cambridgeincolour.com/tutorials/image-interpolation.htm • http://en.wikipedia.org/wiki/Rendering_pipeline • http://graphics.stanford.edu/~kayvonf/papers/fatahalianCACM.pdf • http://graphics.stanford.edu/papers/mprast/rast_hpg09.pdf • http://menohack.acm.jhu.edu/CUDAWriteup.pdf

Parallel Image Processing