330 likes | 505 Views
Reasons for this talk (too many assumptions). Pro's of EH I've heardMore centralized error handling
E N D
1. How much does Exception Handling cost, really? Kevin Frei
Visual C++ Code Generation & Tools
http://blogs.msdn.com/freik
2. Reasons for this talk (too many assumptions) Pros of EH Ive heard
More centralized error handling & recovery
More robust code
More readable code Cons of EH Ive heard
Can result in people not thinking about error conditions
Can make error recovery difficult (must put handler in the right place)
Enables abuse of exceptions
3. Summary of the previous Pros & Cons They can all be dealt with
Coding Convention enforcement
Code Reviews
Good initial architecture
Consistent API designs
4. #1 reason I hear to not use EH: Exception handling makes my code too slow
May be true, but may also be masking a more serious problem
Some Facts:
EH performance cost is dependent on the runtime, CPU architecture, and ABI/OS specifics.
You cant simply examine source code to determine performance impact.
Deciding whether to use EH should depend on the team, the libraries youre using, and a myriad of other issues.
5. Classes of Code Quality impact Usage Penalty [EH tax]
General overhead of a function with any EH construct
Cost of entering a protected region
__try{}, try{}, C++ object with a destructor
Cleanup costs
__finally invocation
C++ object destructors
Optimization constraints
Cost of actually handling an exception
If youre really concerned about this, youre probably abusing exceptions.
6. EH tax for Structured Exception Handling X86
All functions with SEH contain a complex prolog & epilog
X64
No required cost to the function itself
7. EH tax for C++ exception handling X86
All functions with C++ EH contain a complex prolog & epilog
X64
1 additional DWORD allocated on stack, initialized to -2
never again used in the functions code
Its used by the C++ runtime in the event of an exception being thrown or caught.
8. Protected Region entry & exit costs X86
Entry & exit from any protected region requires a 1 or 4 byte constant value written to the stack
/EHs can reduce this cost
/EHa may be required by your code base, though
X64
If an entry or an exit is preceded by a call, there is a single byte NOP to properly identify region boundaries
Entry preceded by a call is pretty common for C++ EH (constructors)
9. Non-exception cleanup costs X86
SEH: __finally clause is called
[current implementation, not required]
call/ret overhead
Some other minor register allocation issues
C++EH: Destructor invoked inline [C++ standard]
Destructor can be inlined, based on compiler (& user) decision
X64
SEH: __finally clause inlined [zero overhead]
[again, current implementation, not required]
C++EH: same as x86
10. Optimization Constraints Disclaimer Consider the complete alternative solution!
HRESULT checking is messy, and error prone
The goto solution to handle termination can result in pessimized dataflow
Most optimizations that must be constrained for EH should be constrained for implementations that dont use EH.
11. Optimization constraints Mandatory optimization constraints
Limitations required by the language standard
ABI specific limitations
Current Implementation constraints
Ill focus on UTC (current optimizer) in VC8
Code base from VC5 origins.
Many constraints have been removed, which exist in earlier versions
12. Mandatory optimization constraints:Language specific limitations The C++ language standard does not specify anything about non-C++ throw exceptions!
The C language standard does not specify anything about exceptions at all, really.
[I know nothing about C99]
13. Language specific limitations: C++ Flow from trys to catch (and out):
Results in additional flow edges at call sites that may throw exceptions
Variable values must be updated accordingly
Slightly less constant propagation, common sub expression elimination, dead stores, etc
/EHs assume only the C++ throw statement can cause an exception
Prior to VC8.0, you could compile /EHs, and even with an AV, most destructors would be invoked.
For VC8.0 /EHs:
If you throw a C++ exception, destructors will be run.
If any other exception occurs, no destructors will run. Discuss __declspec(nothrow) vs throw()
Results in obvious problems, rather than only seeing the problem when your customers computer starts crashing.Discuss __declspec(nothrow) vs throw()
Results in obvious problems, rather than only seeing the problem when your customers computer starts crashing.
14. Language specific limitations: /EHa /EHa all exceptions should be considered when destroying C++ objects
Results in far more potential flow from a try block to a catch block
Less stack packing (no stack pack prior to VC8)
Much less constant propagation, common sub expression elimination, etc
15. Quick /EHc description Only has impact with /EHs
Tells the compiler that any extern C function will not throw any C++ exceptions
Win32 API calls fall under this class
Sometimes true, sometimes not be careful.
Only side effect is pruning a few additional edges in the flow graph
A few more opportunities for optimization
16. Mandatory Optimization Constraints:Win32/Win64 ABI specific limitations Tail-call (call/return -> jump) is illegal inside a protected region
Instruction level performance hit is typically negligible
Stack usage increase (can be serious)
Instruction scheduling constraints
Scheduling into & out of handler regions is limited
rarely worth doing, even if it is legal Catch, filter, finally all funclets, and thus scheduling into & out of them is almost impossible, and not likely to matter.
Try blocks (& object lifetimes) can onlyCatch, filter, finally all funclets, and thus scheduling into & out of them is almost impossible, and not likely to matter.
Try blocks (& object lifetimes) can only
17. VC8.0 optimization constraints No impact on any functions that do not contain some EH construct
Sometimes requires the programmer add volatile to get required constraints to occur in function invoked inside a try
Exception handling is only one of a large number of things that can artificially constrain optimizations
setjmp/longjmp (old school EH in C)
__alloca
__declspecs
/GS
/fp:except, /fp:precise, /fp:restrict
Many many more.
18. VC8.0 optimization constraints:Specifics Late flow optimizations for x64
Primarily head & tail merging
Loop optimizer disabled (all platforms) for any function with a try/__try
Loop unrolling/peeling
Induction variable creation
Some strength reduction
Doesnt impact functions with only C++ objects!
Stack Packing restrictions
Prior to VC8, all variables inside a try block were written back to the stack whenever their values were updated
With VC8, only variable values that may be visible outside of the try are written back to the stack.
19. Source code used for samples SEH Version
void seh_finally() {
init();
__try {
foo();
bar();
blah();
} __finally {
done();
}
}
C++ Version
struct obj {
obj() {init();}
~obj() {done();}
};
void cpp_dtor() {
obj a;
foo();
bar();
blah();
}
20. Generated code for x86 SEH /O2 push ebp
mov ebp, esp
push -1
push OFFSET __sehtable$?seh_finally@@YAXXZ
push OFFSET __except_handler3
mov eax, DWORD PTR fs:0
push eax
mov DWORD PTR fs:0, esp
sub esp, 8 ;End Prolog
call init
mov DWORD PTR __$SEHRec$[ebp+20], 0 ;Enter __try
call foo
call bar
call blah
mov DWORD PTR __$SEHRec$[ebp+20], -1 ;Exit __try
call $seh_finally_funclet ;Invoke __finally
mov ecx, DWORD PTR __$SEHRec$[ebp+8] ;Begin Epilogue
mov DWORD PTR fs:0, ecx
mov esp, ebp
pop ebp
ret 0
$seh_finally_funclet:
call done
ret 0
21. Generated code for x86 SEH /O1 push 8
push OFFSET __sehtable$seh_finally
call __SEH_prolog ;End Prologue
call init
and __$SEHRec$[ebp+20], 0 ;Entry __try
call foo
call bar
call blah
or __$SEHRec$[ebp+20], -1 ;Exit __try
call $seh_finally_funclet ;Invoke __finally
call __SEH_epilog ;Begin Epilogue
ret 0
$seh_finally_funclet:
call blah
ret 0
22. Generated code for x86 C++ /O2 push -1
push __ehhandler$?cpp_dtor@@YAXXZ
mov eax, DWORD PTR fs:0
push eax
mov DWORD PTR fs:0, esp ;End Prologue
push ecx ;allocate space for obj
call init ;obj() inlined
mov DWORD PTR __$EHRec$[esp+24], 0 ;Enter try
call foo
call bar
call blah
mov DWORD PTR __$EHRec$[esp+24], -1 ;Exit try
call done ;~obj() inlined
mov ecx, DWORD PTR __$EHRec$[esp+16] ;Begin Epilogue
mov DWORD PTR fs:0, ecx
add esp, 16
ret 0
23. Generated code for x86 C++ /O1 mov eax, __ehhandler$?cpp_dtor@@YAXXZ
call __EH_prolog ;End Prologue
push ecx ;allocate space for obj
call init ;obj() inlined
and DWORD PTR __$EHRec$[ebp+8], 0 ;Entry try
call foo
call bar
call blah
or DWORD PTR __$EHRec$[ebp+8], -1 ;Exit try
call done ;~obj() inlined
mov ecx, DWORD PTR __$EHRec$[ebp] ;Begin Epilogue
mov DWORD PTR fs:0, ecx
leave
ret 0
24. Generated code for x86 No EH (/O1 & /O2 are basically identical) push esi ;Save nonvolatile register for result
call init
call foo_err
mov esi, eax ;Save return code
test esi, esi ;Return code check
jne SHORT $fail
call bar_err
mov esi, eax ;Save return code
test esi, esi ;Return code check
jne SHORT $fail
call blah_err
mov esi, eax ;Save return code
$fail:
call done
mov eax, esi ;Return result
pop esi
ret 0
25. Generated code for x64 SEH sub rsp, 40 ;End Prologue
call init
nop
call foo ;First instruction of __try
call bar
call blah
nop ;Last instruction of __try
call done ;__finally invoked inline
add rsp, 40 ;Begin Epilogue
ret 0
26. Generated code for x64 C++ EH sub rsp, 56 ;End Prologue
mov QWORD PTR $T[rsp], -2 ; C++ setup
call init
nop
call foo ;First instruction of try
call bar
call blah
nop ;Last instruction of try
add rsp, 56 ;Begin Epilogue
jmp done ;~obj() inlined & tail called
27. Generated code for x64 No EH push rbx ;Save nonvolatile register for result
sub rsp, 32 ;End Prologue
call init
call foo_err
mov ebx, eax ;Save return code
test eax, eax ;Return code check
jne SHORT $fail
call bar_err
mov ebx, eax ;Save return code
test eax, eax ;Return code check
jne SHORT $fail
call blah_err
mov ebx, eax ;Save return code
$fail:
call done
mov eax, ebx ;Get return code
add rsp, 32
pop rbx ;Restore nonvolatile register
ret 0
28. Costs of handling an exception Disclaimer:
If you are really concerned about this, there is a good chance youre abusing or misusing exceptions.
Exceptions are not to deal with standard scenarios! Performance of exceptions is generally stacked in favor of the non-exceptional case
Theres a reason the term is exception!
29. Costs of handling an exception:X86 Win32 SEH & C++ EH Without /SAFESEH (this is a big no-no potential security hole)
O(n)
n is the number of frames on the stack with a protected region between throw & catch
Walk a linked list of elements on [fs:0]
Invoke filters to determine handler
C++ type check is just a special filter
Walk the list again, invoking __finally funclets & destructors
Finally, jump to __except block or call catch block
With /SAFESEH (this is good)
O(n log(m))
n is the number of frames on the stack with a protected region between throw & catch
m is the number of EH entry points in the entire program
For SEH, only 1. For C++ EH, one for each function!
Walk a linked list of elements of [fs:0]
For each element, verify the callback is in a list [O log(m)]
Invoke the filter to determine the handler
Walk the list again, invoking __finallys, with callback verification [O log(m)]
stack unwinding is completed prior to jumping to the __except entry point
Stack is not yet unwound for invoking the catch block
stack unwinding is completed prior to jumping to the __except entry point
Stack is not yet unwound for invoking the catch block
30. Costs of handling an exception:x64 Win64 SEH & C++ EH O(n log(m))
n is the number of functions on the stack between throw & catch (not just the number with EH code in them!)
m is the number of distinct regions in the image [.pdata size]
Not just a function count hot/cold sections and register allocation regions can increase this pretty dramatically (1-4x)
Walk each function frame on the stack [O(n)]
Find its .pdata entry to get its unwind information [O(log(m))]
If it has a filter, call it to determine the handler
Restore nonvolatile registers as described in the unwind information
Once a handler has been determined
Walk the stack again (using .pdata lookup)
Each frame that has cleanup code, invoke the finallys or destructors
Jump to handler (or call catch)
31. Cost of handling an exception:x86 WoW64 SEH & C++EH There is some degree of thunking between the 64 bit kernel and 32 bit subsystem, so performance really varies.
Worst case, its as slow as x64 on Win64.
Best case its about the same as x86 on Win32.
If you use exception handling in performance sensitive areas of code, you may notice a difference in your application
If you do notice a difference, this should be a red flag regarding your use of exceptions.
32. Final gotchas (non-standard C++!) int g; // add a volatile to fix the problem
int *p;
void func1() {
g = 0;
__try {
g = 1;
*p = 0;
g = 2;
} __except(1) {
printf("%d\n", g);
}
} void update() {
g = 1;
*p = 0;
g = 2;
}
void func2() {
g = 0;
__try {
update();
} __except(1) {
printf("%d\n", g);
}
}
33. Summary & Conclusions Do not use exceptions for normal program flow.
Exception handling does have a performance cost
Not always measurable
Cost really depends on usage
Frequently similar to what correct code would be, without EH
[at least in VC8]
Do not use exceptions for normal program flow.
C++ is cheaper than SEH for cleanup in VC8.
Use common sense, and knowledge of your teams strengths/weaknesses if youre mandating SEH/C++ EH/No EH
New hires rarely know about SEH.
Source level readability & visibility of performance
And finally, do not use exceptions for normal program flow.
34. More info If youre looking for detailed ABI docs for X64, check my blog.
http://blogs.msdn.com/freik
Herb Sutters got some good books on using exceptions with C++
He doesnt give me kick backs ?