350 likes | 533 Views
How much does Exception Handling cost, really?. Kevin Frei Visual C++ Code Generation & Tools http://blogs.msdn.com/freik. Pro’s of EH I’ve heard More centralized error handling & recovery More robust code More readable code. Cons of EH I’ve heard
E N D
How much does Exception Handling cost, really? Kevin Frei Visual C++ Code Generation & Tools http://blogs.msdn.com/freik
Pro’s of EH I’ve heard More centralized error handling & recovery More robust code More readable code Cons of EH I’ve heard Can result in people not thinking about error conditions Can make error recovery difficult (must put handler in the “right” place) Enables abuse of exceptions Reasons for this talk (too many assumptions)
Summary of the previous Pro’s & Con’s • They can all be dealt with • Coding Convention enforcement • Code Reviews • Good initial architecture • Consistent API designs
#1 reason I hear to not use EH: • “Exception handling makes my code too slow” • May be true, but may also be masking a more serious problem • Some Facts: • EH performance cost is dependent on the runtime, CPU architecture, and ABI/OS specifics. • You can’t simply examine source code to determine performance impact. • Deciding whether to use EH should depend on the team, the libraries you’re using, and a myriad of other issues.
Classes of Code Quality impact • Usage Penalty [EH tax] • General overhead of a function with any EH construct • Cost of entering a protected region • __try{}, try{}, C++ object with a destructor • Cleanup costs • __finally invocation • C++ object destructors • Optimization constraints • Cost of actually handling an exception • If you’re really concerned about this, you’re probably abusing exceptions.
EH tax for Structured Exception Handling • X86 • All functions with SEH contain a complex prolog & epilog • X64 • No required cost to the function itself
EH tax for C++ exception handling • X86 • All functions with C++ EH contain a complex prolog & epilog • X64 • 1 additional DWORD allocated on stack, initialized to -2 • never again used in the function’s code • It’s used by the C++ runtime in the event of an exception being thrown or caught.
Protected Region entry & exit costs • X86 • Entry & exit from any protected region requires a 1 or 4 byte constant value written to the stack • /EHs can reduce this cost • /EHa may be required by your code base, though • X64 • If an entry or an exit is preceded by a call, there is a single byte NOP to properly identify region boundaries • Entry preceded by a call is pretty common for C++ EH (constructors)
Non-exception cleanup costs • X86 • SEH: __finally clause is called • [current implementation, not required] • call/ret overhead • Some other minor register allocation issues • C++EH: Destructor invoked inline [C++ standard] • Destructor can be inlined, based on compiler (& user) decision • X64 • SEH: __finally clause inlined [zero overhead] • [again, current implementation, not required] • C++EH: same as x86
Optimization Constraints Disclaimer • Consider the complete alternative solution! • HRESULT checking is messy, and error prone • The goto solution to handle termination can result in pessimized dataflow • Most optimizations that must be constrained for EH should be constrained for implementations that don’t use EH.
Optimization constraints • Mandatory optimization constraints • Limitations required by the language standard • ABI specific limitations • Current Implementation constraints • I’ll focus on UTC (current optimizer) in VC8 • Code base from VC5 origins. • Many constraints have been removed, which exist in earlier versions
Mandatory optimization constraints:Language specific limitations • The C++ language standard does not specify anything about non-C++ throw exceptions! • The C language standard does not specify anything about exceptions at all, really. • [I know nothing about C99]
Language specific limitations: C++ • Flow from try’s to catch (and out): • Results in additional flow edges at call sites that may throw exceptions • Variable values must be updated accordingly • Slightly less constant propagation, common sub expression elimination, dead stores, etc… • /EHs – assume only the C++ throw statement can cause an exception • Prior to VC8.0, you could compile /EHs, and even with an AV, most destructors would be invoked. • For VC8.0 /EHs: • If you throw a C++ exception, destructors will be run. • If any other exception occurs, no destructors will run.
Language specific limitations: /EHa • /EHa – all exceptions should be considered when destroying C++ objects • Results in far more potential flow from a try block to a catch block • Less stack packing (no stack pack prior to VC8) • Much less constant propagation, common sub expression elimination, etc…
Quick /EHc description • Only has impact with /EHs • Tells the compiler that any extern “C” function will not throw any C++ exceptions • Win32 API calls fall under this class • Sometimes true, sometimes not – be careful. • Only side effect is pruning a few additional edges in the flow graph • A few more opportunities for optimization
Mandatory Optimization Constraints:Win32/Win64 ABI specific limitations • Tail-call (call/return -> jump) is illegal inside a protected region • Instruction level performance hit is typically negligible • Stack usage increase (can be serious) • Instruction scheduling constraints • Scheduling into & out of handler regions is limited • rarely worth doing, even if it is legal
VC8.0 optimization constraints • No impact on any functions that do not contain some EH construct • Sometimes requires the programmer add volatile to get required constraints to occur in function invoked inside a try • Exception handling is only one of a large number of things that can artificially constrain optimizations • setjmp/longjmp (old school EH in C) • __alloca • __declspec’s • /GS • /fp:except, /fp:precise, /fp:restrict • Many many more.
VC8.0 optimization constraints:Specifics • Late flow optimizations for x64 • Primarily head & tail merging • Loop optimizer disabled (all platforms) for any function with a try/__try • Loop unrolling/peeling • Induction variable creation • Some strength reduction • Doesn’t impact functions with only C++ objects! • Stack Packing restrictions • Prior to VC8, all variables inside a try block were written back to the stack whenever their values were updated • With VC8, only variable values that may be visible outside of the try are written back to the stack.
SEH Version void seh_finally() { init(); __try { foo(); bar(); blah(); } __finally { done(); } } C++ Version struct obj { obj() {init();} ~obj() {done();} }; void cpp_dtor() { obj a; foo(); bar(); blah(); } Source code used for samples • No EH Version int noeh_cleanup() { int result = 0; init(); result = foo_err(); if (result) goto fail; result = bar_err(); if (result) goto fail; result = blah_err(); fail: done(); return result; }
Generated code for x86 SEH /O2 push ebp mov ebp, esp push -1 push OFFSET __sehtable$?seh_finally@@YAXXZ push OFFSET __except_handler3 mov eax, DWORD PTR fs:0 push eax mov DWORD PTR fs:0, esp sub esp, 8 ;End Prolog call init mov DWORD PTR __$SEHRec$[ebp+20], 0 ;Enter __try call foo call bar call blah mov DWORD PTR __$SEHRec$[ebp+20], -1 ;Exit __try call $seh_finally_funclet ;Invoke __finally mov ecx, DWORD PTR __$SEHRec$[ebp+8] ;Begin Epilogue mov DWORD PTR fs:0, ecx mov esp, ebp pop ebp ret 0 $seh_finally_funclet: call done ret 0
Generated code for x86 SEH /O1 push 8 push OFFSET __sehtable$seh_finally call __SEH_prolog ;End Prologue call init and __$SEHRec$[ebp+20], 0 ;Entry __try call foo call bar call blah or __$SEHRec$[ebp+20], -1 ;Exit __try call $seh_finally_funclet ;Invoke __finally call __SEH_epilog ;Begin Epilogue ret 0 $seh_finally_funclet: call blah ret 0
Generated code for x86 C++ /O2 push -1 push __ehhandler$?cpp_dtor@@YAXXZ mov eax, DWORD PTR fs:0 push eax mov DWORD PTR fs:0, esp ;End Prologue push ecx ;allocate space for obj call init ;obj() inlined mov DWORD PTR __$EHRec$[esp+24], 0 ;Enter try call foo call bar call blah mov DWORD PTR __$EHRec$[esp+24], -1 ;Exit try call done ;~obj() inlined mov ecx, DWORD PTR __$EHRec$[esp+16] ;Begin Epilogue mov DWORD PTR fs:0, ecx add esp, 16 ret 0
Generated code for x86 C++ /O1 mov eax, __ehhandler$?cpp_dtor@@YAXXZ call __EH_prolog ;End Prologue push ecx ;allocate space for obj call init ;obj() inlined and DWORD PTR __$EHRec$[ebp+8], 0 ;Entry try call foo call bar call blah or DWORD PTR __$EHRec$[ebp+8], -1 ;Exit try call done ;~obj() inlined mov ecx, DWORD PTR __$EHRec$[ebp] ;Begin Epilogue mov DWORD PTR fs:0, ecx leave ret 0
Generated code for x86 No EH (/O1 & /O2 are basically identical) push esi ;Save nonvolatile register for result call init call foo_err mov esi, eax ;Save return code test esi, esi ;Return code check jne SHORT $fail call bar_err mov esi, eax ;Save return code test esi, esi ;Return code check jne SHORT $fail call blah_err mov esi, eax ;Save return code $fail: call done mov eax, esi ;Return result pop esi ret 0
Generated code for x64 SEH sub rsp, 40 ;End Prologue call init nop call foo ;First instruction of __try call bar call blah nop ;Last instruction of __try call done ;__finally invoked inline add rsp, 40 ;Begin Epilogue ret 0
Generated code for x64 C++ EH sub rsp, 56 ;End Prologue mov QWORD PTR $T[rsp], -2 ; C++ setup call init nop call foo ;First instruction of try call bar call blah nop ;Last instruction of try add rsp, 56 ;Begin Epilogue jmp done ;~obj() inlined & tail called
Generated code for x64 No EH push rbx ;Save nonvolatile register for result sub rsp, 32 ;End Prologue call init call foo_err mov ebx, eax ;Save return code test eax, eax ;Return code check jne SHORT $fail call bar_err mov ebx, eax ;Save return code test eax, eax ;Return code check jne SHORT $fail call blah_err mov ebx, eax ;Save return code $fail: call done mov eax, ebx ;Get return code add rsp, 32 pop rbx ;Restore nonvolatile register ret 0
Costs of handling an exception Disclaimer: If you are really concerned about this, there is a good chance you’re abusing or misusing exceptions. Exceptions are not to deal with standard scenarios! Performance of exceptions is generally stacked in favor of the non-exceptional case There’s a reason the term is “exception”!
Costs of handling an exception:X86 – Win32 – SEH & C++ EH • Without /SAFESEH (this is a big no-no – potential security hole) • O(n) • n is the number of frames on the stack with a protected region between throw & catch • Walk a linked list of elements on [fs:0] • Invoke filters to determine handler • C++ type check is just a special filter • Walk the list again, invoking __finally funclets & destructors • Finally, jump to __except block or call catch block • With /SAFESEH (this is good) • O(n log(m)) • n is the number of frames on the stack with a protected region between throw & catch • m is the number of EH entry points in the entire program • For SEH, only 1. For C++ EH, one for each function! • Walk a linked list of elements of [fs:0] • For each element, verify the callback is in a list [O log(m)] • Invoke the filter to determine the handler • Walk the list again, invoking __finally’s, with callback verification [O log(m)]
Costs of handling an exception:x64 – Win64 – SEH & C++ EH • O(n log(m)) • n is the number of functions on the stack between throw & catch (not just the number with EH code in them!) • m is the number of distinct regions in the image [.pdata size] • Not just a function count – hot/cold sections and register allocation regions can increase this pretty dramatically (1-4x) • Walk each function frame on the stack [O(n)] • Find it’s .pdata entry to get it’s unwind information [O(log(m))] • If it has a filter, call it to determine the handler • Restore nonvolatile registers as described in the unwind information • Once a handler has been determined • Walk the stack again (using .pdata lookup) • Each frame that has cleanup code, invoke the finally’s or destructors • Jump to handler (or call catch)
Cost of handling an exception:x86 – WoW64 – SEH & C++EH • There is some degree of thunking between the 64 bit kernel and 32 bit subsystem, so performance really varies. • Worst case, it’s as slow as x64 on Win64. • Best case it’s about the same as x86 on Win32. • If you use exception handling in performance sensitive areas of code, you may notice a difference in your application • If you do notice a difference, this should be a red flag regarding your use of exceptions.
int g; // add a volatile to fix the problem int *p; void func1() { g = 0; __try { g = 1; *p = 0; g = 2; } __except(1) { printf("%d\n", g); } } void update() { g = 1; *p = 0; g = 2; } void func2() { g = 0; __try { update(); } __except(1) { printf("%d\n", g); } } Final gotchas (non-standard C++!) • Some optimizations that are constrained inside of a try result in observable differences, based on program structure, compiler settings, and compiler implementation .
Summary & Conclusions • Do not use exceptions for normal program flow. • Exception handling does have a performance cost • Not always measurable • Cost really depends on usage • Frequently similar to what correct code would be, without EH • [at least in VC8] • Do not use exceptions for normal program flow. • C++ is cheaper than SEH for cleanup in VC8. • Use common sense, and knowledge of your team’s strengths/weaknesses if you’re mandating SEH/C++ EH/No EH • New hires rarely know about SEH. • Source level readability & visibility of performance • And finally, do not use exceptions for normal program flow.
More info • If you’re looking for detailed ABI docs for X64, check my blog. • http://blogs.msdn.com/freik • Herb Sutter’s got some good books on using exceptions with C++ • He doesn’t give me kick backs