270 likes | 410 Views
Visual C++ 2005 New Optimizations. Ayman Shoukry Program Manager Visual C++ Microsoft Corporation. How can your application run faster?. Maximize optimization for each file. Whole Program Optimization (WPO) goes beyond individual files.
E N D
Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation
How can your application run faster? • Maximize optimization for each file. • Whole Program Optimization (WPO) goes beyond individual files. • Profile Guided Optimization (PGO) specializes optimizations specifically for your application. • New Floating Point Model. • OpenMP • 64bit Code Generation.
Maximum Optimization for Each File • Compiler optimizes each source code file to get best runtime performance • The only type optimization available in Visual C++ 6 • Visual C++ 2005 has better optimization algorithms • Specialized support for newer processors such as Pentium 4 • Improved speed and better precision of floating point operations • New optimization techniques like loop unrolling
Whole Program Opitmization • Typically Visual C++ will optimize programs by generating code for object files separately • Introducing whole program optimization • First introduced with Visual C++ 2002 and has since improved • Compiler and linker set with new options (/GL and /LTCG) • Compiler has freedom to do additional optimizations • Cross-module inlining • Custom calling conventions • Visual C++ 2005 supports this on all platforms • Whole program optimizations is widely used for Microsoft products.
Profile Guided Optimization • Static analysis leaves many open optimization questions for the compiler, leading to conservative optimizations • Visual C++ programs can be tuned for expected user scenarios by collecting information from running application • Introducing profile guided optimization • Optimizing code by using program in a way how its customer use it • Runs optimizations at link time like whole program optimization • Available in Visual Studio 2005 • Widely adopted in Microsoft Is it common for p to be NULL? If it is not common for p to be NULL, the error code should be collected with other infrequently used code if (p != NULL) { /* Perform action with p */} else { /* Error code */}
PGO: Instrumentation • We instrument with “probes” inserted into the code • Two main types of probes • Value probes • Used to construct histogram of values • Count (simple/entry) probes • Used to count number of times a path is taken • We try to insert the minimum number of probes to get full coverage • Minimizes the cost of instrumentation
PGO Optimizations • Switch expansion • Better inlining decisions • Cold code separation • Virtual call speculation • Partial inlining
Profile Guided Optimization Object files Compilewith /GL & Optimizations On (e.g. /O2) Source Object files Link with /LTCG:PGI Instrumented Image Scenarios Profile data Instrumented Image Output Profile data Link with /LTCG:PGO Optimized Image Object files
a bar baz PGO: Inlining Sample • Profile Guided uses call graph path profiling. foo bat
PGO: Inlining Sample (Cont) • Profile Guided uses call graph path profiling. 10 75 a bar baz 20 50 foo bar baz 100 15 bat bar baz 15
bar baz PGO – Inlining Sample (cont) • Inlining decisions are made at each call site. 10 a 20 125 foo 100 15 bat bar baz 15
Most frequent values are pulled out. if (i == 10) goto default; switch (i) { case 1: … case 2: … case 3: … default:… } PGO – Switch Expansion // 90% of the // time i = 10; switch (i) { case 1: … case 2: … case 3: … default:… }
Defaultlayout Optimized layout A A B B C D D C PGO – Code Separation Basic blocks are ordered so that most frequent path falls through. A 100 10 B C 100 10 D
PGO – Virtual Call Speculation The type of object A in function Func was almost always Foo via the profiles void Func(Base *A) { … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … } } void Bar(Base *A) { … while(true) { … A->call(); … } } class Base{ … virtual void call(); } class Foo:Base{ … void call(); } class Bar:Base { … void call(); }
PGO – Partial Inlining Basic Block 1 Cond Hot Code Cold Code More Code
PGO – Partial Inlining (cont) Basic Block 1 Cond Hot path is inlined, but NOT the cold Hot Code Cold Code More Code
Demo Optimizing applications with VC++ 2005
New Floating Point Model • /Op made your code run slow • No intermediate switch • New Floating Point Model • /fp:fast • /fp:precise (default) • /fp:strict • /fp:except
/fp:precise • The default floating point switch • Performance and Precision • IEEE Conformant • Round to the appropriate precision • At assignments, casts and function calls
/fp:fast • When performance matters most • You know your application does simple floating point operations • What can /fp:fast do? • Association • Distribution • Factoring inverse • Scalar reduction • Copy propagation • And others…
/fp:except • Reliable floating point exceptions • Thrown and not thrown when expected • Faults and traps, when reliable, should occur at the line that causes the exception • FWAITs on x86 might be added • Cannot be used with /fp:fast and in managed code
/fp:strict • The strictest FP option • Turns off contractions • Assumes floating point control word can change or that the user will examine flags • /fp:except is implied • Low double digit percent slowdown versus /fp:fast
What is the output? #include <stdio.h> int main() { double x, y, z; double sum; x = 1e20; y = -1e20; z = 10.0; sum = x + y + z; printf ("sum=%f\n",sum); } /fp:fast /O2 = 0.000 /fp:strict /O2 = 10.0
OpenMP • A specification for writing multithreaded programs • It consists of a set of simple #pragmas and runtime routines • Makes it very easy to parallelize loop-based code • Helps with load balancing, synchronization, etc… • In Visual Studio, only available in C++
1 ≤ i ≤ 250 251 ≤ i ≤ 500 501 ≤ i ≤ 750 751 ≤ i ≤ 1000 OpenMP Parallelization • Can parallelize loops and straight-line code • Includes synchronization constructs void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; } } first = 1 last = 1000
64bit Compiler in VC2005 • 64bit Compiler Cross Tools • Compiler is 32bit but resulting image is 64bit • 64bit Compiler Native Tools • Compiler and resulting image are 64bit binaries. • All previous optimizations apply for 64bit as well.
Resources • Visual C++ Dev Center • http://msdn.microsoft.com/visualc • This is the place to go for all our news and whitepapers • Also VC2005 specific forums at http://forums.microsoft.com • Myself • http://blogs.msdn.com/aymans