410 likes | 554 Views
Shader generation and compilation for a programmable GPU. Student: Jordi Roca Monfort Advisor: Agustín Fernández Jiménez Co-advisor: Carlos González Rodríguez. Outline. Introduction. Background. Goals. Design and implementation. Conclusions. Introduction. OpenGL Application.
E N D
Shader generation and compilation for a programmable GPU Student: Jordi Roca Monfort Advisor: Agustín Fernández Jiménez Co-advisor: Carlos González Rodríguez
Outline • Introduction. • Background. • Goals. • Design and implementation. • Conclusions.
OpenGL Application OpenGL trace Vendor OpenGL API GLPlayer Vendor Driver ATTILA simulation framework GLInterceptor ATTILA OpenGL API ATTILA Driver ATTILA Simulator Statistics
OpenGL Application OpenGL trace GLInterceptor Extend/Complete OpenGL API to execute recent/advanced 3D Applications (Doom3, Unreal Tournament, etc) Vendor OpenGL API GLPlayer Vendor driver ATTILA Driver Simulates last generation of 3D graphics boards (programmable GPUs) ATTILA Simulator Statistics My Work ATTILA OpenGL API
Renderization (I) • ¿What is called renderization? • Generate the pixels for a set of images/frames forming an animated scene. • Goal: compute each pixel color as fast as possible → determines FPS • ¿Which computations are required? • Given the scene objects DB, compute the color of the projected objects in the pixel screen area. • Each pixel color depends on the scene lighting and the viewer camera position.
View Info Position Lighting Info Position, Color Geometry info Renderization (II) Renderization data Screen area
Renderization approaches • For each pixel (x,y) compute physical interaction between the lights and objects in scene: • RayTracing, Radiosity, Photon Map • Very expensive pixel computation: • Global lighting (shadows, indirect reflections among objects) • Interaction between objects and lights are computed only in vertices and for each pixel (x,y) the corresponding value is approached. • Direct Rendering (3D graphics boards, 3D game consoles, etc.). • Only direct illumination from light sources (Each vertex color is independent)
Color interpolation Viewer Info Position Lighting Info Position, Color Geometry info Direct Rendering (I) Renderization data Screen area
Direct Rendering (II) • The higher density of vertices, the more realistic lighting. • In addition, more vertices are required to improve level of detail in surfaces. • Thus: • ▲realism→▲vertices→▲computation→▼FPS • Solution: • Specify surface using less vertices and • Specify surface details using textures.
Textures Viewer Info Position Lighting Info Position, Color Geometry info Renderization data Textures Screen area
Texture mapping 1 (0.63,0.86) (0.26,0.37) (0.79,0.10) 0 1 0 Screen area
Texture mapping Coordinate interpolator 1 (0.63,0.86) (0.26,0.37) (0.40,0.45) Texture sampled value (0.79,0.10) 0 1 0 Screen area
Textures Generate interpolated attributes (color, coordinates) 3D scene Vertex DB • Compute: • color • coordinates • vertex position in screen Per-pixel texture mapping Lighting info Viewer info Final screen RASTERIZER Vertex processing stage (VERTEX SHADING) Parallelizable process Fragment processing stage (FRAGMENT SHADING) Parallelizable process 3D Rendering Pipeline
3D RP Implementation • Implementations • Software: • Mesa 3D Graphics Library (OpenGL). • Software + hardware acceleration: • Vendor OpenGL, Direct3D, Xbox, PlayStation, etc. • Work distribution between CPU y graphics board transparently to the applications.
VGA BD BD BD BD CPU Final screen Final screen Final screen Final screen Rasterizer Rasterizer Rasterizer Rasterizer FS FS FS FS VS VS VS VS 3D accelerators CPU GPU PGPU CPU CPU 3D accelerators evolution • 2D accelerators (pre Voodo) <1996 • 3D accelerators (3Dfx Voodo) 1996 • Graphical Processor Units (GeForce) 1999 • Programmable GPUs (GeForce 3) 2001
Rasterizer Fragment stream F1 (x,y) Interpolatedcolor Texture coordinate 1 Texture coordinate 2 Final color Texture Memory + * Fragment Unit 0 GPUs: applying 2 textures • Uses: • Per-pixel lighting. • Shadow implementation. • Bump-mapping. Fixed Function
Rasterizer Fragment Stream F1 (x,y) Interpolatedcolor Texture coordinate Texture coordinate Final color Texture Memory ALU Temporals Fragment Shader 0 Programmable GPUs: 2 textures LDTEX t1, coord1, Text1 LDTEX t2, cood2, Text2 ADD t1, colorIn, t1 MUL t1, t1, t2 Shader Processors
Shader Processors • SP execute small programs (shaders) using vectorial and scalar instructions, that define the computation in the following stages: • Vertex processing: Vertex Shader • Lighting computation • On-screen vertex projection • Texture coordinates generation. • Fragment processing: Fragment Shader • Texture color fetch and blending. • FOG • It is like a GPU supporting “infinite visualization effects” not supported in previous graphics boards generations.
Idea: Perform Fixed Function emulation through generating equivalent shaders for SP. Goals • Implement all the necessary modules in the OpenGL API to: • Support new real 3D applications using shaders in our simulation framework. • Support also for old applications using FF and applications combining both shaders and FF.
Things to do • Implement shader support in our OpenGL API: • Using the most used shader programming language by 3D apps: ARB_vertex_program y ARB_fragment_program • Study how to express FF functions in terms of shaders (pre-study phase).
BD !!ARBvp1.0 ATTRIB pos = vertex.position; PARAM mat[4] = { state.matrix.mvp }; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP4 result.position.x, mat[0], pos; DP4 result.position.y, mat[1], pos; DP4 result.position.z, mat[2], pos; DP4 result.position.w, mat[3], pos; # Pass the primary color through # w/o lighting. MOV result.color, vertex.color; END !!ARBfp1.0 #first set of texture coordinates ATTRIB tex = fragment.texcoord; # interpolated color ATTRIB col = fragment.color; OUTPUT outColor = result.color; TEMP tmp; #sample the texture TEX tmp, tex, texture, 2D; #perform the modulation MUL outColor, tmp, col; END Final screen Rasterizer Fragment Shader Vertex Shader FF Emulation
FF emulation • Implemented functions (according to OpenGL Spec 2.0): • Vertex Shading (85% of total): • Per-vertex standard OpenGL lighting: • Point, directional and spot lights. • Attenuation. • Local and infinite viewer. • Vertex transformation • Automatic texture coordinate generation. • Object Plane and Eye Plane • Normal Map, Reflection Map and Sphere Map. • FOG coordinate. • Fragment Shading (90% of total): • Multi-texturing and texture combine functions • FOG application: • Linear, Exponential and Second Order Exponential
FF emulation example • FOG application: • Algorithm: For each pixel, perform linear interpolation between the original and the fog color, accoding to the distance from the object to the viewer.
FOG emulation • FOG exponential mode f = e-density*fogcoord f = 2-(density * fogcoord)/ln(2) (e = 21/ln 2) Final color = pixel color * f + fog color * (1 - f)
FOG emulation !!ARBfp1.0 ATTRIB fogCoord = fragment.fogcoord; OUTPUT oColor = result.color; PARAM fogColor = state.fog.color; PARAM fogParams = program.local[0]; # fogParams.x : density/ln(2) TEMP fragmentColor, fogFactor; # Texture applications. ... # Fog Factor computing ... MUL fogFactor.x, fogParam.x, fogCoord.x; # fogFactor.x = density*fogcoord/ln(2) EX2_SAT fogFactor.x, -fogFactor.x; # fogFactor.x = 2^-(fogFactor.x) # Fog color interpolation LRP oColor, fogFactor.x, fragmentColor, fogColor; END
ARB compilers !!ARBvp1.0 ATTRIB pos = vertex.position; PARAM mat[4] = { state.matrix.mvp }; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP4 result.position.x, mat[0], pos; DP4 result.position.y, mat[1], pos; DP4 result.position.z, mat[2], pos; DP4 result.position.w, mat[3], pos; # Pass the primary color through # w/o lighting. MOV result.color, vertex.color; END !!ARBfp1.0 #first set of texture coordinates ATTRIB tex = fragment.texcoord; # interpolated color ATTRIB col = fragment.color; OUTPUT outColor = result.color; TEMP tmp; #sample the texture TEX tmp, tex, texture, 2D; #perform the modulation MUL outColor, tmp, col; END
Line:By0By1By2By3By4By5By6By7By8By9ByAByBByByDByEByF 011: 16 00 03 28 00 01 00 08 26 1b 6a 00 0f 1b 04 78 012: 09 00 03 00 00 00 02 08 24 1b 1b 00 08 1b 14 18 013: 09 00 04 00 00 00 02 08 24 1b 1b 00 04 1b 14 b8 014: 09 00 05 00 00 00 02 08 24 1b 1b 00 02 1b 04 58 015: 09 00 06 00 00 00 02 08 24 1b 1b 00 01 1b 04 f8 016: 16 00 01 00 00 00 02 30 24 1b 1b 00 08 1b 14 98 017: 16 00 02 00 00 01 02 30 24 1b 1b 00 08 1b 04 38 018: 16 00 00 00 00 00 03 30 24 00 1b 00 02 1b 04 d8 019: 16 00 01 00 00 00 03 30 24 00 1b 00 01 1b 14 78 020: 01 00 08 00 00 08 18 08 24 04 ae 00 0c 1b 04 18 021: 17 00 00 00 00 00 13 30 24 00 00 00 08 1b 04 b8 022: 17 00 01 00 00 00 13 30 24 00 00 00 04 1b 14 58 023: 01 00 08 00 00 09 18 08 24 04 04 00 0c 1b 14 f8 024: 01 00 08 00 00 0a 18 08 26 04 ae 00 0c 1b 04 98 025: 01 00 08 00 00 0b 18 08 26 04 04 00 0c 1b 14 38 !!ARBvp1.0 PARAM arr[5] = { program.env[0..4] }; #ADDRESS addr; ATTRIB v1 = vertex.attrib[1]; PARAM par1 = program.local[0]; OUTPUT oPos = result.position; OUTPUT oCol = result.color.front.primary; OUTPUT oTex = result.texcoord[2]; ARL addr.x, v1.x; MOV res, arr[addr.x - 1]; END Code generation GPU Specific Generic Symbol table IR Lexical - Syntactic Analysis (Flex + Bison) !!ARBvp1.0 Semantic Analysis The compilers common architecture
IRProgram header: “!!ARBvp1.0” Program Statements IRVP1ATTRIBStatement name: pos attrib: vertex.position IRInstruction opcode: DP4 destination sources IRSrcOperand IRDstOperand IRSrcOperand destination: result.position source: mat source: pos swizzleMask: xyzw swizzleMask: xyzw writeMask: x isInputRegister: false isResultRegister: true isInputRegister: false Intermediate Representation !!ARBvp1.0 ATTRIB pos = vertex.position; PARAM mat[4] = { state.matrix.mvp }; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP4 result.position.x, mat[0], pos; DP4 result.position.y, mat[1], pos; DP4 result.position.z, mat[2], pos; DP4 result.position.w, mat[3], pos; # Pass the primary color through # w/o lighting. MOV result.color, vertex.color; END • Example:
Semantic analysis and generic code generation • Features: • Implemented using the visitor pattern. • Decouples IR from the different operations involved in each compiler phase. • Allows using a common analyzer and a common code generator for both program types.
GenericCode Machine File Descriptor GenericInstruction Specific Code GenericInstruction GPUInstruction GPUInstruction GPUInstruction Code generation • Phase 1: Generate an architecture-independent generic code assuming unbounded machine resources. • Phase 2: Translate to specific code being aware of the concrete GPU architecture constraints.
Conclusions • Achieved goals: • Now, the OpenGL API implementation supports: • Fixed Function emulation • Of almost the entire set of functions of VS and FS stages (the most important ones). • Shader compilation for ARB_vertex_program and ARB_fragment_program specifications. • Both compilers share most of the implementation. • Clear separation between generic and specific stages.
Future work • Support/include other 3D RP parts (i.e. interpolation) like programables stages to reduce hardware complexity and power consumption (embedded systems). • Implement high-level shading languages compilers (GLSlang, HLSL).