620 likes | 813 Views
Cg: A system programming graphics hardware in a C-like language. William R. Mark The University of Texas at Austin R. Steven Glanville NVIDIA Corporation Kurt Akeley NVIDIA Corporation Mark J. Kilgard NVIDIA Corporation.
E N D
Cg: A system programming graphics hardware in a C-like language William R. MarkThe University of Texas at AustinR. Steven Glanville NVIDIA CorporationKurt Akeley NVIDIA CorporationMark J. Kilgard NVIDIA Corporation Siggraph 2003
Cg’s Model of GPU [Cg Toolkit]
The Graphics Pipeline [Programming Graphics Hardware]
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
Introduction • Graphics architectures are now highly programmable, and support application-specified assembly programs for both vertex processing and fragment processing • Most effective tool for programming these architectures is a high level language • program portability, improved programmer productivity, easier develop programs incrementally and interactively • particularly valuable for shader programs
Introduction • A system for programming graphics hardware that supports programs written in a new C-like language named Cg
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
IRIS GL(SGI, 1982) RenderMan(Pixar, 1988) OpenGL(ARB, 1992) PixelFlow Shading Language (UNC, 1998) Reality Lab(RenderMorphics, 1994) Real-Time Shading Language (Stanford, 2001) Direct3D(Microsoft, 1995) The Evolution of GPU Programming Language C(AT&T, 1970s) C++(AT&T, 1970s) Java(Sun, 1970s) HLSL(Microsoft, 2002) Cg(NVIDIA, 2002) GLSL(ARB, 2003) [NVIDIA]
Background • In real-time rendering systems, support for user programmability has evolved with the underlying graphics hardware • For many years, mainstream commercial graphics hardware was configurable , but not user programmable • multipass rendering techniques: SGI’s OpenGL shader system [2000] and Quake III’s shading language [1999]
Background • In response to this trend, graphics architects began to incorporate programmable processors into both the vertex-processing and fragment-processing stages of single-chip graphics architectures [2001] • The most recent generation of PC graphics hardware (DirectX 9 or DX9 hardware [2002]), continues the trend of adding programmable functionality to both the fragment and the vertex processors
DX9-class Architectures • Vertex processor • adds conditional branching functionality • Fragment processor • adds flexible support for floating-point arithmetic and computed texture coordinates
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
Design Goals • Ease of programming • programming in AL is slow and painful • easy reuse of code • Portability • hardware from different companies • hardware generations (DX8-class hardware or better) • operating systems (Windows, Linux, MacOS) • major 3D APIs (OpenGL, DirectX)
Design Goals • Complete support for hardware functionality • Performance • Minimal interference with application data • Ease of adoption • Extensibility for future hardware • Support for non-shading uses of GPU • (some of these goals are in partial conflict with each other)
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
Key Design Decisions • A “general-purpose language”,not a domain-specific “shading language" • A program for each pipeline stage • Permit subsetting of language • Modular system architecture
Domain-specific vs. General-purpose Language • Domain-specific languages • shading computation • General-purpose languages • expose the fundamental capabilities of programmable graphics architectures
“General-purpose Language" • When considered with our design goals,let us to develop a hardware focused general-purpose language • high performance • minimal management of application data • support for non-shading uses of GPU’s
Cg follows C's philosophy • C language in achieving goals for performance, portability, and generality of CPU programs that were very similar to our goals for a GPU language • Extend and modify C to support GPU architectures effectively → Cg • language follows syntax and philosophy of C • reserves all C and C++ keywords • selectively uses ideas from C++, Java, RenderMan, RTSL • 3Dlabs, OpenGL ARB(GLSL), Microsoft (HLSL)
Programming Model • Choosing a programming modelto layer on top of the stream-processing architecture • RTSL, RenderMan: single program • OpenGL, Direct3D: two separate programs • the programs consume an element of data from one stream, and write an element of data to another stream • Single-program model is not a natural match for the underlying dual-processor architecture
Vertex Program Executed Once Per Vertex Fragment Program Executed Once Per Fragment A Program for Each Pipeline Stage The user-programmable processors in today's graphics architectures use a stream-processing model [Programming Graphics Hardware]
A language for Expressing Stream Kernels • A single language specification for writing a stream kernel (i.e. vertex program or fragment program) • simplify and generalize the language by eliminating most of the distinctions between vertex / fragment programs • And then allowed particular processors to omit support for some capabilities of the language • e.g. use of texture lookuptoday’s vertex processor don’t support texture lookups
A language for Expressing Stream Kernels • Current Cg system can be thought as a specialized stream processing system • Cg system relies on the established graphics pipeline dataflow of GPUs • not connect stream processing kernels together • Cg’s focus on kernel programming • specialized for stream-kernel programming • could be extended to support other parallel programming models
A Data-flow Interface for Program Inputs and Outputs • Should the system allow any vertex program communicates with any fragment program ? • via the rasterizer / interpolator • How should the vertex program outputs and fragment program inputs be defined to ensure compatibility ?
A Data-flow Interface for Program Inputs and Outputs • When programming GPUs at the assembly level • the interface between fragment programs and vertex programs is established at the register level • For example: user can establish a conventionTEXCOORD3 I/O register • The binding names must be chosen from a predefined namespace with predefined data types
A Data-flow Interface for Program Inputs and Outputs • Cg and HLSL: modified bind-by-name scheme • a predefined namespace is used instead of the user-defined identifier name • provide maximum control over the generated code • Cg also supports a bind-by-position • requires that data be organized in an ordered list • a function-parameter list or a list of structure members • GLSL: purebind-by-name • not supported by either Cg or HLSL
Permit Subsetting of Language • Conflict goals: portability and comprehensive • Major differences in functionality between the different graphics architecture that Cg supports • e.g. DX9: floating-point fragment arithmetic • Consider a variety of possible approaches to hiding or exposing these difference • minor architectural differences could be efficiently hidden by the compiler, Cg did so • major architectural differences can not be hidden by a compiler → Performance
Permit Subsetting of Language • Cg wanted both support • the existing installed base of DX8-class hardware • to provide access to the capabilities of the latest hardware • Cg: • expose major architectural differences asdifferences in language capabilities • to minimize the impact on portability, Cg exposed the differences using a subsetting mechanism • each processor is defined by aprofile • specifies which subset of the full Cg specification is supported on that processor
No Mandatory Virtualization • Whether or not to automatically virtualizehardware resources using software-based multi-pass techniques ? • Do not require it in the Cg language specification (not support in the current release of Cg) • effective virtualization of this hardware is impossible • too slowly to be useful in a real-time application • conflicted with our design goals(virtualization on current hardware requires global management of application data and hardware resources)
Layered Above An Assembly Language Interface • Whether or not to expose machine / assembly language as an additional interface for system users ? • By providing access to the assembly code, the system allows users • tune their code by studying the compiler output • manually editing the compiler output • even write programs entirely in assembly language • maximize performance
Explicit Program Parameters • All input parameter to a Cg program • be explicitly declared using non-static global variables • by including the parameters on the entry function’s parameter list • Cg also provides a set of runtime API routines that allow parameters to be passed using their true names and types
Explicit Program Parameters • The Cg compiler prepends a header to its assembly code output to describe the mapping betweenprogram parameter and registers #profile arbvp1 #program simpleTransform #semantic simpleTransform.brightness #semantic simpleTransform.modelViewProjection #var float4 objectPosition : $vin.POSITION : POSITION : 0 : 1 #var float color : $vin.COLOR : COLOR : 1 : 1 …. #var float brightness :: c[0] : 8 : 1 #var float4x4 modelViewProjection :: c[1], 4 : 9 : 1
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
vector of four float Example Program • Example Cg Program for Vertex Processor void simpleTransform(float4 objectPosition : POSITION, float4 color : COLOR, float4 decalCoord : TEXCOORD0, out float4 clipPosition : POSITION, out float4 oColor : Color, out float4 oDecalCoord : TEXCOORD0, uniform float brightness, uniformfloat4x4 modelViewProjection) { clipPositon = mul(modelViewProjection, objectPosition); oColor = brightness * color; oDecalCoord = decalCoord; }
Other Cg Functionality • Provides structure, arrays, (+, *, /, etc.), boolean type and (||, &&, !, etc.), (++/--), (?:), (+=, etc.) • Supports programmer-defined functions(recursive functions are not allowed) • Provides only a subset of C’s control flow construct:(do, while, for, if, break, continue) (goto, switch) are not supported • Doesn’s support pointers or bitwise operations • Supports #include, #define, #ifdef, etc. (matching the C preprocessor)
Introduction Background Design Goals Key Design Decisions Cg Language Summary Design Issues CgFX System Experiences Conclusion Outline
Design Issues • Support for hardware • User-defined interfaces between modules • Other language design decisions • Runtime API
Support for Hardware • The discussion below is organized around the characteristics of GPU hardware • Stream processor • Data types • Indirect addressing • Interaction with the rest of the graphics pipeline • Shading-specific hardware functionality
Stream Processor • A GPU program is executed many times –once for each vertex or fragment • efficiently: input → changes vs. unchanged(reside in different register sets) • A GPU language compiler must know the category to which an input belongs before it can generate assembly code
Stream Processor • Terminology for the two kind of input • varying input • uniform input • Cg uses the uniform qualifier • Computation that depend only on uniform parameter • do not need to be redone for every vertex or fragment
Data Type • Multiple numeric data types • float(32-bit), half(16-bit), fixed(12-bit) • Vector data types and operators • Matrix data types and operations • Not support integer data types • Add a bool data type for conditional operation
Indirect Addressing • Current graphics processors have very limited indirect addressing capability (uniform, sampler) • An array assignment in Cg performs a copy of the entire array • Cg currently forbids the use of pointer • Cg currently forbids recursive function calls • Support call-by-value-result semantics • using a notation (in and out parameter modifier)
Interaction with the Rest of the Graphics Pipeline • Some of the I/O register are used to control the non-programmable parts of the graphics pipeline, rather than to pass general-purpose data • The Cg specification mandates that certain register identifiers(e.g. POSITION) be supported as an output by all vertex profiles, and that certain other identifiers be supported by all fragment profiles
Shading-specific Hardware Functionality • The least generation of graphics hardware include a variety of capabilities specialized for shading • Chose to expose the latest generation of graphics hardware capability via Cg’s standard library functions • maintains the general-purpose nature of the language • Cg standard library supports a variety of mathematical, geometric, and specialized functions
User-defined Interface Between Modules • The general-purpose solution we chose is adopted from Java and C# • Programmer may define an interface, which specifies one or more function prototypes • Programmer implements the interface by defining a struct (i.e. class) that contains definition for the interface’s function
Other Language Design Decisions • Function overloading by types and by profiles • Constants are typeless • No type checking for textures
Function Overloading by Types and by Profile • Support function overloading by data type • mechanism is similar to C++ (less complex) • Also permit overloaded by profile • it is possible to write multiple versions of a function that are optimized for different architecture • the compiler will automaticallychose the version for the current profile
Overloading • Function overloading by hardware profile // For ps_1_1 profile, use cubemap to normalize ps_1_1 float3 mynomalize(float3 v) { return texCUBE(norm_cubmap, v.xyz).xyz; } //For ps_2_0 profile, use stdlib routine to normalize ps_2_0float3 mynormalize(float3 v) { return normalize(v); }
Constants are Typeless • Change the type promotion rulesfor constants • C: float x; 2.0*x → double precision • Cg: half y; 2.0*y → half precision • Internally, the new constant promotion rules are implemented by assigning a different type (cfloat or cint) to constants that do not have an explicit type suffix