280 likes | 295 Views
Status – Week 207. Victor Moya. Summary. Z Test box. Z Compression. Z Cache. Stencil. HZ Box. HZ Test. Traces. Z Test box. Z Test box includes: Z cache. Z encoder (compress and reference value). Z decoder (decompress). Z test. Z update. Stencil test. Stencil update.
E N D
Status – Week 207 Victor Moya
Summary • Z Test box. • Z Compression. • Z Cache. • Stencil. • HZ Box. • HZ Test. • Traces.
Z Test box • Z Test box includes: • Z cache. • Z encoder (compress and reference value). • Z decoder (decompress). • Z test. • Z update. • Stencil test. • Stencil update.
Fragments/Stamps Reference Z value Fetch Enc Z Cache Read Compressed Z Line/Block Stencil Test Dec Z Test Stencil Update Write Fragments/Stamps
Z Compression. • ATI HOT 3D in Eurographics 2000. • 8x8 pixel block (Z cache line). • DDPCM : differential differential pulse code modulation. • Two modes: • ½ of original size. • ¼ of original size. • Entropy encoder. • Entropy encoders? • Hufffman. • Arithmetic encoder.
1D Z Compression 8 input z values - - - - - - - - - - - - - Entropy Encoder
2D Z Compression 64 pixels 2D DDPCM Entropy Encoder Packer
Z Compression • ATI patent application 20030038803. • Two reference values MAX and MIN. • Offset values. • Windows. • Other method I don’t understand yet … • S3 patent 6,411,295. • Similar approach. • Others.
Z Compression • Method 1: • MIN and MAX per cache line/block. • 1 bit flag per pixel/Z value telling which reference value to use. • The offset from MIN or MAX reference values are stored in the compressed output. • The offsets must be inside a window of T values (log2T = bits per offset) from MIN and MAX.
z = Zmin + T - 1 z = Zmax - T + 1 Z = 0 Z = 1 Zmin Zmax
MAX MIN
Z Compression • Method 2: • Z values are divided into upper and lower bits. • Keep UMAX and UMIN. • Calculate A = Umin - 1, B = UMAX + 1. • 2-bit flag per pixel/Z value references the upper bits from { UMAX, UMIN, A, B}. • Lower bits per pixel/Z value are stored in the compressed output.
A B Z = 0 Z = 1 Zmin Zmax Umin << a Umax << a
Umin Umin
Z Compression • Reference values in the compressed output. • Compression flags on die. • Useful for fast clear too.
Z Cache • Normal cache? • Or ‘fetch’ cache? • Normal cache that supports a large number of active misses (miss on miss, miss on hit). • Or prefetching?
Z Cache • Fetch vs Prefetch. • Fetch needs additional state (bits) per cache line. • Fetch needs additional port to the cache tag file. • Fetch implies a large queue or stalls somewhere. • Prefetch requires a predictor. • Prefetch may request data that won’t be used (failed predictions).
Z Cache • Prefetching. • Very easy to predict next data inside a triangle (large). • Quite common (middle-small triangles). • Easy to predict next data inside a tristrip or triangle list batch. • Very common. • Hard to predict next data between batches (or meshes)? • But will happen rarely.
Z Cache • “Fetch cache” • In fact prefetching. • Texture Prefetching Architecture. • Akeley course. • Igehy, Eldridge, Proudfoot, Prefetching in a texture cache architecture. • Not read yet. • Slightly different concept: • Our fetch cache is accessing twice the tag file. • But simulated is the same as we are not taxing the tag file access!! • Change mechanism so that fetch returns pointer to the cache line.
Rasterizer Texture Memory Request FIFO FIFO Cache Tags Reorder Buffer Stall Cache Data Texture Filter Texture Apply
Stencil • Stencil and Z share a 32 bit word per pixel: • 8/24. • 0/32. • 2x16 (Z only!!).
Stencil • Stencil compression: • If stencil is not active and is cleared: • Remove stencil field from compressed data. • If stencil is active or not cleared: • Compress stencil? • Independent of Z compression. • Needs more compression flag bits. • Which is the average stencil value? Or log2 of the value? • How much can be saved? 8b to 1b, 2b, 4b? Worth of it?
HZ Box • Hierarchical Z buffer. • Number of levels? • Size? • On die? • Includes: • Memory for storing the different levels. • Update mechanism. • Process requests and updates.
HZ Box • ATI model (from patents XXX, and XXX). • 2 levels. • 1st level is from original 8x8 blocks (z cache line). • 2nd level is 2x2 (?) values from level 1. • Update mechanism: • Z Max (or Z Min) from the Z encoder (compressor) for a 8x8 block (cache line). • Combining cache for level 2 (?). • Write and update on eviction from combining cache (?).
HZ Test • Compares the incoming Z value from a graphic object to the reference Z value stored in one or more of the Hierarchical Z levels. • What can be tested: • Triangle Z (or 3 vertex Z). • Cull a whole triangle. • Blocks of fragments: • Good for recursive descent or tiled!!. • Large blocks to level 2. • 8x8 (or less) blocks to level 1. • Stamps (2x2) or fragments: • Against level 1 (slow access? fast update?). • Against level 2 (fast access? slow update?).
HZ L2 HZ L1
Traces • I stalled Carlos work so delayed until next week.
Web • I’m writing my web page. • GPU3D page? • Public/private.