300 likes | 322 Views
Learn about lossy and lossless compression methods for reducing data size in game executables, map and geometry data, and chat text. Explore techniques such as Huffman coding, arithmetic coding, and ad hoc dictionary methods.
E N D
KIPA Game Engine Seminars Day 17 Jonathan Blow Ajou University December 14, 2002
Data Compression • You end up paying for the data you transmit • So you want to keep it small • Before we talked about high-level methods of doing this (deciding what to transmit) • Now we will do low-level methods (how to transmit)
Lossy/Lossless compression • Need lossless compression for game executables, map/geometry data, chat text • Need lossy compression for most world state (positions, orientations, etc)
Forms of Lossless Compression • Huffman Coding • Arithmetic Coding • ad hoc dictionary methods • Like bigram/trigram compression
Huffman Coding • Break input into some number of symbols • Build a binary tree based on symbol probabilities • Assign 1 code bit to each branch of the binary tree • This gives you an unambiguous set of code symbols of varying lengths • Disadvantage: need discrete input symbols!
Arithmetic Coding • Arithmetic coding can be thought of as a continuous version of Huffman coding • Won’t go into the math here, but it’s readily available on the Web • See Charles Bloom’s PPMZ page: • http://www.cbloom.com/src/ppmz.html
Probabilistic Modeling • Don’t just want to count the probability of individual symbols • Languages have patterns of probability that can be exploited • Examples in English of “qu”, “ssi”, “zx” • You want to model your symbol probabilities on this for good compression
ad hoc methods • “bigram/trigram compression” for English text exploits these common language patterns. • Sentence: • “This is a test of a compressor”. • Probably “thi”, “is”, “st”, “ss” get picked out as n-gram symbols
Generalized vs SpecializedCompression • Generalized compression tries to make data small without knowing specific things about the data • With network data this can be a problem • Division of data into tokens/symbols, what if the source data is not all the same size? • By putting specialized compression into your game, you can do much better • We will talk about specialized compression of the rest of today • Lossy compression is usually highly specialized
Putting data into a network or file buffer • Network messages and files are handled with the byte as the basic data unit • You can’t have something be a fractional number of bytes • So we can store all our values in sizes of 1-byte multiples also • A value between 0-20000 takes 2 bytes • A value between 0-99 takes 1 byte
But actually… • If we want to avoid transmitting waste bits, we should not pack them into a buffer • Our values can be multiples of 1 bit, not 1 byte • Examples of 0-99 and 0-20000
How do we know how many bits we need? • Compute ceil(log2(range_max)) • Earlier this week we talked about fast ways to compute this value
Code review of Bit_Packer • and Bit_Unpacker • Functions that try to put variable-bit-size values into buffers, and get them out, as quickly as possible • “Use math, not strings!”
But packing whole bits is wasteful… • Example of a first person shooter with 5 object types and 3 player states • Player, armor, health, ammo, rocket • Alive, dead, observer • These quantities need 3 bits and 2 bits • 15 possibilities, but we have room for 32.
Another way to divide 15 possibilities • A 5x3 grid • One possibility in each grid square • Only one square wasted if we packinto 4 bits • How do we encode this?
Code review of Multiplication_Packer • Uses integer divide/mod to decode • Decoding slower than encoding • May be bad if you are talking about file loading • May be exactly what you want, if you want a server with many clients to run quickly • Because the labor of unpacking is distributed among all the clients, but the labor of packing is concentrated on the server
The Multiplication_Packer • is a sort of primitive version of an arithmetic coder (without the prediction component)
Transmitting scalars • Two choices: • Quantize them into integers, and pack the integers • With Multiplication_Packer or Bit_Packer or whatever • Store a reduced-accuracy floating point number
Code review of Float_Encoder • Class in the engine • Converts a 32-bit float to a smaller number of bits • Returns the result as an integer • Though it is really in floating-point format
Rounding Scalarsduring Quantization • Proper rounding can save you 1 bit of data • Need to think about whether you want to preserve 0 or 1 • Or whether it’s okay to add energy (if not, don’t round!)
1a: Method TL TRUNCATE: LEFT RECONSTRUCT: 1b: Method TC TRUNCATE: CENTER RECONSTRUCT: 1c: Method RL ROUND: LEFT RECONSTRUCT: 1d: Method RC (big screw-up!) ROUND: CENTER RECONSTRUCT: Figure 1: Four methods of quantizing real numbers. The yellow notches represent the integers; red arrows show which real numbers map to each integer. Blue dots show which real numbers will be reconstructed.
1a sign (1 bit) exponent (8 bits) mantissa (23 bits) 1b input mantissa 01100111010 rounding constant + 00000001000 sum = 01101000010 0110100 truncated sum Figure 2a: The format of a 32-bit IEEE floating-point number. 2b: Adding a rounding constant before truncating the mantissa (the [ahem currently an awful color]-colored area represents the number of bits we want to truncate to.
Transmitting Vectors • 2D example • Want to minimize the mean error between input points and quantized points • This is the same thing we were doing with 1D quantization, but having two dimensions has complicated things
Discussion of error in 2D • Relation to vector-valued variance • Minimizing the error means finding a regular tesselation of the plane that is compact • A circle is the most compact shape for a given area • But we can’t tessellate a plane with equal-sized circles!
Demo of numerical optimization program • Just simulating a bunch of points that repel each other • What do we end up with?
Hexagonal Grid • Best way to quantize 2D space • An analogous form for 3D • Can be indexed as a projected cubic grid • See file hex_indexing.txt in the source distribution
What about unit vectors? • Unit vectors are confined to the surface of a sphere • That represents a curved space, which makes things more difficult • Descriptions of standard ideas • Quantized azimuth/elevation • Weird ways of slicing up a sphere
This time I use energy minimization directly • Compute a set of dictionary vectors • Dot product to find the best dictionary vector for any input • You can BSP the dictionary if you want speed
Quaternions • Are just 4D unit vectors • You can use the energy minimization solution directly • Though, maybe it takes too much RAM if you need a lot of precision • Remember you save 1 bit due to the double cover!
Using Probability Modeling • For walking around in an MMORPG