1 / 42

Real-Time Parallel Radiosity

Real-Time Parallel Radiosity. Matt Craighead May 8, 2002 6.338J/18.337J, Course 6 AUP. What is Radiosity?. A computer graphics technique for lighting Two types of lighting algorithms: Local: easy, fast, but not realistic Global: slow, difficult, but highest quality

rupp
Download Presentation

Real-Time Parallel Radiosity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-Time Parallel Radiosity Matt Craighead May 8, 2002 6.338J/18.337J, Course 6 AUP

  2. What is Radiosity? • A computer graphics technique for lighting • Two types of lighting algorithms: • Local: easy, fast, but not realistic • Global: slow, difficult, but highest quality • Radiosity is a global algorithm • Global algorithms try to take into account interreflections in scenes.

  3. Radiosity in Real Time? • Local algorithms can easily run in real time, either in software or hardware. • Computational demands grow with number of surfaces and number of lights. 106 surfaces is not unreasonable! • Most global algorithms take quadratic time in number of surfaces (all interactions!).

  4. Local Doesn’t Mean Bad Id Software’s Doom 3 (video capture)

  5. …but Global is Better State-of-the-art radiosity rendering from 1988 (5 hours render time!)

  6. Local Lighting Math • Consider a light source at some point in three-dimensional space, and a surface at some other location. • How much does the light directly contribute to the brightness of the surface? • Note that global algorithms still need to do local lighting as a first step.

  7. Local Lighting Math • Set up a 3-dimensional coordinate system centered on the surface. Define unit vectors L (light), N (normal), E (eye): • L points towards the light. • N is perpendicular to the surface. • E points towards the viewer. N E L

  8. Local Lighting Math • If an object sits between the light and surface, the lighting contribution is zero. • Otherwise… • Surfaces generally reflect light in two ways: “diffuse” (dull) and “specular” (shiny). • We can see three colors: red, green, blue. • So let Md, Ms be 3-vectors indicating how much of [R,G,B] are reflected in each way.

  9. Local Lighting Math • Diffuse lighting is independent of view angle. It is brightest when N and L are most closely aligned, and falls off with the cosine of the angle between them. • All lighting also falls off with the square of the distance from the light source. • So, the diffuse term is d-2Md(N · L).

  10. Local Lighting Math • Specular lighting is view-dependent. In one simple formulation (Blinn shading), we determine the vector H halfway between E and L and evaluate d-2Ms(N · H)s, where s indicates the shininess of the surface. H N E L

  11. Local Lighting Math • The easiest way to think of H, the half-angle vector, is that if H and N were aligned, and the surface were a mirror, then the light would reflect straight to the eye. • (N · H)s can be thought of as representing a probability distribution of “microfacets,” whose normals are clustered around N but do vary. Smoother surfaces have higher s.

  12. Local Lighting Math • So, our full contribution from a light source is d-2(Md(N · L) + Ms(N · H)s). • This may also be multiplied by a light color. • We may also add in an emissive term Me for glowing objects. • If the lack of interreflection makes things too dark, we may add an “ambient” term LaMd.

  13. Approximations • We may not compute this formula at every pixel on the screen, but only at the vertices of the object instead. • The specular exponent may be evaluated using a power approximation rather than a real power function. • The specular formula is already a cheesy hack…

  14. More Approximations • Shadows are hard. At the cost of much realism, they can be omitted or faked. • Draw a little dark spot under an object • Ambient is itself an approximation of real global illumination. • N · L and (N · H)s are idealizations—in reality, this can be an arbitrary 4-dimensional function called a “BRDF.”

  15. Radiosity • Radiosity is a global algorithm that handles diffuse lighting only. • The term “global illumination” refers to global algorithms that handle specular lighting also. • Specular makes things much more difficult. • I will only discuss plain radiosity.

  16. Radiosity in a Nutshell • Suppose there are n surfaces in the scene. • Let Ai be the area of surface i. • Let Ei be the amount of light energy emitted from surface i per unit time and area. • Let Bi be the amount of light energy emitted and/or reflected from surface i per unit time and area. • Let i be the diffuse albedo of surface i.

  17. Radiosity in a Nutshell • Now, let Fij (called a “form factor”) be the fraction of light from surface i that reaches surface j. • Then, for all i, we must have: AiBi = AiEi + i j  [1,n] AjBjFji • This is just a linear system with n equations and n unknowns. Solve and you get the B’s.

  18. That Easy? • Well, not quite that easy. • Solving the system of equations is O(n3). • So we’ll iterate instead. • We still have to do local lighting. • These become the Ei’s. • We have to compute the Fij terms somehow. • This turns out to be expensive. • If the scene is static, precompute!

  19. Computing Form Factors • Fij turns out to be a big ugly integral over the area of both i and j. • Worse, one of the terms in the integral is whether the two dA’s can see one another! • So, no closed-form solution. • Standard numerical integration is no good. A raycast per sample takes too long.

  20. Computing Form Factors • The usual solution is called the “hemicube algorithm.” • Render the scene from the point of view of the surface, in all directions. In effect, you are projecting onto a hemicube. • Count up the number of times you can see each surface (weighted appropriately). • Takes advantage of 3D acceleration!

  21. Hemicube Algorithm in Action

  22. Simplified Radiosity Equation • It so happens that FjiAj = FijAi. • This is a simple property of the integral for F. • So we can simplify the radiosity equation: AiBi = AiEi + i j  [1,n] AjBjFji AiBi = AiEi + i j  [1,n] AiBjFij Bi = Ei + i j  [1,n] BjFij B = E + RB (where Rij = iFij)

  23. Solving the Radiosity Equation • B = E + RB is just a matrix/vector equation. • Direct solution: B = (I  R)-1E • Iterative solution: • If E is a local lighting solution, then call it B0. • Now let Bi+1 = B0 + RBi. • Then, Bi is simply the lighting solution after i bounces! Since i < 1 for all i (conservation of energy), the Bi’s converge to B.

  24. Iterative vs. Direct Radiosity • If F is a dense matrix, then direct solution takes time O(n3), while k steps of iteration takes time O(kn2). • Realistically, k is just a constant; say, 5. • Iterative solution is practical with n ranging up to the hundreds of thousands! • Iteration time is proportional to the number of nonzero entries of F.

  25. Sparsity of Form Factors • In a simple cube-shaped room, all form factors will be nonzero except for pairs on the same wall. • So F is 5/6 nonzero. Not very encouraging… • As more objects are added, F becomes sparser. • As the scene expands beyond one room, F becomes much sparser. • So iterative radiosity scales extremely well!

  26. Storage and Precision • Radiosity hogs memory. • If you grid a cube-shaped room 100x100 on each wall, and store F as a dense matrix of floats, that’s 14.4 GB. (!!!) • Storing F as sparse helps a lot. • Good for iteration speed too. • Using smaller values than floats helps too. • 16-bit fixed-point is good enough.

  27. RLE Encoding of Form Factors • F tends to have runs of zeros and nonzeros. • Smart traversal order of grids makes the runs longer. Bad Good

  28. RLE Encoding of Form Factors • My disk storage format: • 1 byte: run length, run type • Length up to 84, type is zero, 255, or 65535 • Then, variable # bytes with run data • Compression ratio for my scene: 5.97:1 • My memory storage format: • 2 bytes: run length up to 65535 • 2 bytes: run type (zero or 65535) • 2N bytes: run data • Compression ratio for my scene: 2.49:1

  29. Parallelization • Split up the surfaces among the CPUs. • Each CPU owns those rows of the form factor matrix. • Each CPU computes its surfaces’ local lighting. • Every iteration requires an all-to-all communication, so that each CPU has the full B vector. • At present, my storage of F is unbalanced; compression ranges from 4.3:1 to 1.6:1.

  30. Radiosity Iteration Kernel • Hand-written MMX assembly code (only main inner loop shown): inner_loop: prefetchnta [ebx+128] prefetchnta [eax+128] movq mm4, [eax] pshufw mm7, mm4, 0xFF pshufw mm6, mm4, 0xAA pshufw mm5, mm4, 0x55 pshufw mm4, mm4, 0x00 pmulhuw mm4, [ebx] pmulhuw mm5, [ebx+6] pmulhuw mm6, [ebx+12] pmulhuw mm7, [ebx+18] paddw mm0, mm4 paddw mm1, mm5 paddw mm2, mm6 paddw mm3, mm7 add eax, 8 add ebx, 24 dec esi jnz inner_loop

  31. Radiosity Kernel Performance • Timed 8 CPUs doing 500 iterations. • Portable C fixed-point kernel: 38.04 s • Optimized MMX kernel: 21.64 s • On most loaded CPU, works out to 528 million multiply-adds/s for the C version, 1.24 billion multiply-adds/s for MMX. • But MMX code wastes 25% of them, so real rate is 928 million.

  32. Local Lighting Implementation • One raycast is required for each quad, for each light source! This can be expensive. • To accelerate raycasts, I made a simplified version of my scene that was virtually indistinguishable for raycasting purposes. • 13028 quads reduced to 120 polys, 110 cylinders • Some cylinders used as geometry, some as bounding volumes

  33. Overall Performance • Again, 8 CPUs on 500 iterations: • Iteration only: 21.64 s • Communication only: 5.96 s • Iteration plus communication: 26.84 s • All computation: 65.7 s • All computation plus communication: 64.38 s

  34. Remarks on Performance • The communication overlaps very well with the computation, to the point that it is actually a speedup. (!) • MPI_Isend, MPI_Irecv are essential to achieving this. • The O(n) local lighting computation is actually taking much longer than the O(n2) radiosity computation. • Local lighting is only pseudo-O(n), because of the raycast cost—although for large scenes, raycast cost should still be much less than O(n2), due to other optimizations.

  35. Radiosity Frontend • Separate client application that runs on a Windows PC with OpenGL acceleration. • Radiosity solver running on cluster is server. • Original plan was that the frontend would send the scene to the server, and the server would use the scene provided. • Since the cluster has no OpenGL acceleration, I was reluctantly forced to precompute form factors. • All aspects of scene except form factors still sent to the server by the client; form factors are read from disk.

  36. Client/Server Architecture • User precomputes form factors and FTPs them to the Beowulf. • Server listens on beowulf.lcs.mit.edu:5353. • Client connects and sends scene information. • Server reads form factors off of disk. • Both open a network thread. • Server streams radiosity to client via TCP/IP. • Computation, rendering, and communication are completely decoupled.

  37. Frontend Features • Per-vertex or per-pixel local lighting, with local viewer. Optional specular. • Shadows implemented using stencil buffer. • Display radiosity input/output (E and B). • Bilinear filtering of radiosity solutions on grid-shaped surfaces. • Ultra-high-quality mode where radiosity is used for indirect lighting only.

  38. Demonstration of Full System

  39. Demonstration of Full System

  40. Demonstration of Full System

  41. Work Yet to be Done • More complex scene would be nice. This one is 13K quads, and I should be able to do 50K. • More optimization work on raycasting. • Better load balancing. • Optimization of some modes on frontend, so they run reasonably on my laptop’s GeForce2 Go, not just on a GeForce4 Ti… • Alleviation of ugly banding on certain lighting modes caused by 8-bit-per-component precision.

  42. Conclusions • Real-time radiosity is feasible. • Not tomorrow, but today. • If today’s cluster is tomorrow’s desktop, real-time radiosity could start showing up in real applications not too many years from now. • Biggest limitation may be the ability to compute form factors efficiently. • Faster graphics hardware will make this happen.

More Related