420 likes | 642 Views
Sony Computer Entertainment Development Conference 2nd - 3rd August 2001. GS Master class. Mark Breugelmans. What we know about the GS. GS memory is 4meg GS fill rate is 1.2gigapixel/sec (textured) GS input bandwidth is 64bit We can stream up to 1.2gigabyte a second
E N D
Sony Computer EntertainmentDevelopment Conference2nd - 3rd August 2001
GS Master class Mark Breugelmans
What we know about the GS • GS memory is 4meg • GS fill rate is 1.2gigapixel/sec (textured) • GS input bandwidth is 64bit • We can stream up to 1.2gigabyte a second • GS polygon though-put is determined by: • Set-up time (number of cycles per vertex) • Polygon size (number of pixels to draw)
Getting data in • GS runs at 150mhz but with only a 64bit input • That’s around 24megabyte/frame (PAL) to be shared between textures and geometry • Geometry • Use strips for fastest geometry set-up • Textures • Always pack 4,8,16bit textures into 32bit format before hand for fastest transfer.
Texture Transfer Rates • Theoretical rate is 1.2gig/sec • Transfer rates • 32, 24, 16bit 1200 Megabyte/sec (1065*) • 8bit 900 Megabyte/sec (799*) • 4bit 600 Megabyte/sec (383*) • (* path3 measured values) • Sample code shows you how to convert
Small triangles and set-up time • At most 8 textured pixels are drawn per cycle • Up to 8x4 that can be drawn in set-up time • The GS is not very efficient for tiny triangles
Small triangles and Fill-rate • Pixels are drawn by the GS in groups of 8 • Small triangles will not make use of this • Triangle Size Pixels Drawn/Cycle • 1x1 0.12 • 2x2 0.5 • 4x4 2 • 8x8 5.27 • 16x16 6.13
Fill rate factors • Triangle size • Texture to pixel size • Texture filtering modes (Tri-linear, mip-maps) • Fog • Caches • Texture page buffer • Frame/Z page buffer
Frame/Z Buffer Page caches • Frame and Z-Buffer: 8k • split into 2 buffers: 32x32x32bit = 4k each • Page refill is very fast • 8192bits per cycle (150gigabyte/sec bandwidth!) • Whole 8k page buffer refilled in 8 cycles Frame 32x32 Z Buffer 32x32
Frame/Z Page Cache misses • Frame/Z Page cache will get filled line by line as drawing scans down • Fill rate while varying height is roughly constant • Fill rate while varying width varies with page miss • Cache misses due for Frame/Z page don’t drop fill-rate much below 1gigapixel. • Textures are usually more of a problem
Level of detail • As polygon counts head into millions pixel sizes shrink rapidly • PA scans of games suggests better use of LOD would benefit some games significantly. • The back of a 5000 polygon car may result in just 50 visible pixels once projected onto the screen. • Similarly there’s no point having detailed textures that are going to be shrunk so much
A pixel density test • Set all vertices to: • red=0, green=1, blue=0 • alpha blend=destination + source • z test = disabled • texture = disabled • Lighter areas show you where there is high density or overdraw
Texture Page caches 4bit 128x128 • Texture cache: 8k 32bit 64x32 Also used for 24, 8H, 4HL, 4HH 16bit 64x64 8bit 128x64
Texture Cache misses example • 64x32 sprite, 24bit texture • Texture size Fill-rate GS cycles • 64x32 1158 262 • 65x32 596 514 • One pixel outside the page halves fill rate! • Texture cache miss is based on the texture co-ordinates not the original texture size • Crossing texture pages also affects the cache
Crossing Texture Pagesefficiently • The blocks in the pages are zig-zagged in 1/4s, 1/16s etc for efficiency. • Use at most 1/2 page width and height to avoid crossing 3 quarters which causes many block reloads / page misses Crosses 2 quarters Crosses 3 quarters
Recommended subdivision • PA scans showing GS wait for texture • Suggested subdivision for each texture mode: • Texture mode Subdivision • 4bit (128x128) 64x64 • 8bit (128x64) 64x32 • 16bit (64x64) 32x32 • 24/32bit (64x32) 32x16 Not subdivided 256x256(4bit) Subdivided 256x256(4bit)
Reducing texture cache miss • Use 4bit or 8bit textures • Clamp texture to page size to keep in page • Bilinear may fetch 1pixel outside your co-ordinate range. • Either/Or • Keep all textures within one page • Sub-divide polygons until ST co-ordinates of each polygon stay within a half cache page
Mip-maps • Good for avoiding texture reduction • Look better • May help reduce texture transfers for distant drawing • Watch out for performance on large polygons • mip-maps in different pages can cause multiple texture cache reloads
Mip-maps on large primitives • Primitive is drawn line by line • Wall reloads all mipmaps for every line • Road loads each mip-map only once 4 3 2 1 1 2 3 4
Tri-linear performance • Tri-linear fill rate is 1/2 the speed of bilinear. • It’s fetching twice the number of pixels • When two mip-map levels are in different pages Tri-linear is 8x slower than bi-linear • Due to multiple page loads per pixel • Solutions • Keep smaller mip-maps in same page • Disable tri-linear for near mipmap levels • Perhaps do tri-linear as 2 pass with alpha
Alternative FOG • For larger textured primitives it is quicker to do fog as a second pass • Technique • 1st pass draw a textured primitive • 2nd pass gouraud and alpha blended primitive
Scissoring • Early Pixel reject • Pixels discarded in lines • Eliminates all page misses and texture loads • Speed depends on location of triangle 7 7 6 25 26 18 52 52 34 9 12 6 36 280 18 79 1135 34 4 4 2 12 12 2 25 25 2 16x16 triangle 64x64 triangle 128x128 triangle Note: All Timings in GS cycles
Context changes with TEX0_1 • TEX0_1 only takes 2 GS cycles if CLUT isn’t loaded and texture address isn’t changed • TEX2_1 (CLUT) is no quicker than TEX0_1 it just masks some of the TEX0_1 fields
CLUTs • Loading a new CLUT causes 2 things to happen • New CLUT must be loaded • Texture cache is invalidated • Loading a just a CLUT is no faster than loading both CLUT and TEXTURE • However selecting an already loaded CLUT is a zero cost operation.
Fill-rates : Summary • Texture page caches have the biggest effect on fill rate • Subdivide large texture co-ordinate ranges • Keep mip-maps in the same page • Texture reduction also costs fill rate as texel read becomes bottle neck • Frame buffer pages misses aren’t too bad • Cost for big polygons is not bad compared to texture penalties
Making the most of VRAM • 4bit, 8bit palletised are the most compact • Tiled textures with repeat and region repeat • Multi-pass techniques • Alpha blending is zero cost • Useful for multi-pass techniques • Useful blend types • Standard blend between SRC and FRAME • Multiply blend (using alpha channel)
Tiling textures • Very easy way to add detail for little cost • Repeat range • 0.10.4 UV (0 - 1024) • 1.11.4 ST (+- 2048) which is 4x the range • Number of repeats reduces for larger textures • Watch out when scissoring massively tiled polygons • Perspective errors • Recalculate smaller texture co-ordinates
Texture Compression • Monochrome textures can compress really well to 4bit
Texture Compression • The eye is sensitive to gradual changes in luminance so palettes bad look in this case • In this case it would be better to reduce in size and use GS bilinear filter to interpolate
Texture Compression • You can add a low bit depth detail map to a low resolution interpolated image • Total size of the 2 images is much less than a single 24bit image. We can also use tiling.
Colour map 1/16 area of original. 8-bit CLUT up to 32-bit Detail map full-size 2-bit or 4-bit grayscale 2 Pass Texture Compression Original 24-bit or 32-bit image
Texture Compression 2.0 • Detail map CLUT is concentrated around the centre • Eye is sensitive to small changes in luminance. 1.0 0.0 • Detail map is calculated as: • original pixel / colour map pixel = alpha multiply which is then mapped to a CLUT.
CLUT 1 x x 0 0 x x 0 1 x x 1 0 x x 1 1 CLUT 2 0 0 x x 0 1 x x 1 0 x x 1 1 x x 2-bit Luminance Textures 4-bit image
Texture Compression • Decompressing the texture • Draw low resolution colour map normally • Draw detail map with alpha multiply • Two alternatives for detail map drawing • Decompress to a new texture first • Draw directly using two passes • Colour map can serve as a low-res mipmap • Detail map can be faded in for close ups • Benefit is reduced GIF->GS data transfer
Interlace Flickering • For high-resolution you need to run the TV interlaced • Odd and Even lines are drawn alternate frames • Any image not drawn on both lines flickers • Scan line blending solves the problem • This flickering is much more more of a problem than edge aliasing.
Interlace Flickering - Solutions • Choose appropriate mip-map textures • For games not guaranteed to run in a frame • Use 2 circuit method (very easy) • If you can run in a frame you can save some VRAM compared to the 2circuit method • Sprite method: Saves 1/2 a display buffer • Motion blur method: Save all VRAM • 2pass method: Save all VRAM but 2x polygons
Super-sampling techniques and edge Anti-aliasing • Edge anti-aliasing is nice but you must sort your polygons and it’s slower to draw • Down sample is easy but expensive in VRAM • Draw objects to large off-screen buffers and down-sample (we can still Z test if we scale up Z first) • An alternative method • Render 4x with 25% alpha and 1/2 pixel offset in 4 directions. Same effect using extra polygons rather than VRAM
One last thing - Loading screens and framing out • Framing out on loading • Use field mode perhaps • You could use 16bit field mode in the z buffer? • Use a low res background with 2nd circuit text?
Summary • Maximising GS input paths • Transfer textures as 32bit • Consider detail textures and texture tiling • Keeping up fill-rates • Subdivide textures to within caches • Don’t reduce textures • Make use of LOD to avoid <1pixel area triangles • Watch out for penalties on Fog and Mip-maps