300 likes | 399 Views
KIPA Game Engine Seminars. Day 15. Jonathan Blow Seoul, Korea December 12, 2002. Bit Tricks. Generating Bit Masks Is some number a power of two? Avoiding ‘if’ statements (branch prediction) Floating-point absolute value Floating-point compare Floating-point log2. Generating Bit Masks.
E N D
KIPA Game Engine Seminars Day 15 Jonathan Blow Seoul, Korea December 12, 2002
Bit Tricks • Generating Bit Masks • Is some number a power of two? • Avoiding ‘if’ statements (branch prediction) • Floating-point absolute value • Floating-point compare • Floating-point log2
Generating Bit Masks • Suppose we want to mask the low n bits of a machine word • We can generate that with a loop • Show summation equation for the loop • Identity that lets us do something faster
Is some number a power of two? • The power-of-two will be a single bit somewhere in the middle of the word • The power-of-two minus one will be a bit mask like the ones we just looked at • ANDing them together will produce 0
Counting the numberof set bits in a machine word • Slow loop version • “Trick” O(num set bits) version • Discussion of tree version
Pentium 4 “fireball” • A 16-bit integer unit at the core of the chip that runs at very high clock speeds • 32-bit integer operations are pipelined through the fireball as multi-stage 16-bit operations • Pipeline is organized for bits to flow from bottom to top of the word (as with addition and subtraction) • Right-shifts require a dependency that goes in the opposite direction (slower!)
“How many bits does it take to store this range of values?” • Application: network or file i/o • Want ceil(log2(n_max)) assuming the values go from 0 to n_max • Slow floating-point versions • Fast bit-extraction versions
Floating-Point log2 • Show slow version • Fast version utilizing the IEEE-754 format
Fast absolute value • Utilizing IEEE-754 floating point format
Fast floating-point compare • Description of how x86 machines compare floating point numbers • Get at least one of them on the stack • Perform ‘fcomp’ instruction • Load the floating point control word • Bit-mask it to see if the desired field is set
Decision-making without branching • (And without writing in assembly language, to use instructions like CMOV) • Build a mask based on whether some intermediate result is negative or not • Use that to mask values and add them, or whatever you want • Examples
Collision Detection • Speedbox and Schnitzel as alternatives to the “prevent tunneling” raycast
Collision Detection • Don’t forget to optimize mainly for the expected case! • To miss a lot, or to hit a lot? • Example of Shock Force and the “early hit test” • We expect to miss usually! • So the early hit test was not so effective
Collision detection • More Shock Force examples • Hierarchy of tests: bounding sphere, OBB, simple plane divide, BSP “hard case”
Profiling • Motivation • You can’t optimize unless you profile. For some reason some people think they can… they’re wrong. • Demo of sample app • Goals: • Know where the overall CPU is being spent • May depend on which kind of behavior is happening! • Know which routines are stable and which ones are not
Profiling • Example of getting the current time on Windows • At different accuracy levels • Description of how this is slow, and why • Too slow to call very often in code!
Profiling (2) • Using the rdtsc instruction • Converting this to realtime units by calling QueryPerformanceCounter once per frame
Profiling (3) • Define macros that put rdtsc calls into preambles and postambles for functions • Measure and categorize CPU time this way • Measure “self time” and “hierarchical time” • Code review of macros / constructors
Problem with rdtsc • There’s this SpeedStep thing on Intel laptops • Change the CPU’s clock speed based on performance / temperature demands • Does not adjust rdtsc to compensate • May spread beyond laptops in the future • Power consumption of CPUs is becoming an important concern for businesses
We can detect if rdtsc is screwing up profiling data • But we can’t fix the profiling data • Solution: just draw a big warning on the screen
Division of Profiler • Low-Level Profiler • High-Level Profiler
Walkthrough of first demo app • How it uses the macros • How it collects and draws the profiling data
Measuring varianceof profiling data • To figure out how stable each function is • Draw which functions are “hot” in the realtime display
Behaviors • We would like some better analysis of what the different behaviors are for our program • Just “eyeing” the results is not very scientific • Examples of different behaviors • Fill rate limited, AI limited, etc
Batch Profiling vs Interactive Profiling • Batch profiling averages a bunch of data together over a session • Maybe it provides a way to peek at individual samples but the processing is never very convenient • Interactive profiling is about seeing results as soon as they happen • But interactive profilers are usually hacked together • What if we made a good one?
Want to detect and analyzespecific behaviors • But without preconceived ideas of what they might be • Treat incoming frames of profiling data as vectors, and cluster them • Description of k-means clustering
Clustering algorithms tend tobe pretty slow • And they require batch data to process • k-means needs random access to the input! • Online k-means • Faster, non-batch. But quality?
Self-Organizing Map • “Kohonen Self-Organizing Map” • Description of the algorithm • Much like online k-means • But with coherence in a separate space
Demo of SOM-enabledProfiling Tool • Visualizations are still early • Hopefully they will mature into something truly useful (people in other visualization fields like SOMs, so hopes are high)
Discussions of changes made to SOM to support online clustering