190 likes | 210 Views
SYRCoSE: Spring Young Researchers Colloquium on Software Engineering May 28-29, 2009, Higher School of Economics Moscow, Russian Federation. Binary Compatibility of C++ shared libraries on GNU/Linux. Pavel Shved Denis Silakov Institute for System Programming, RAS. Who cares?.
E N D
SYRCoSE: Spring Young ResearchersColloquium on Software EngineeringMay 28-29, 2009, Higher School of EconomicsMoscow, Russian Federation Binary Compatibility of C++ shared libraries on GNU/Linux. Pavel Shved Denis Silakov Institute for System Programming, RAS
Who cares? • C++ — 3rd place in TIOBE index. • Shared libraries — libstdc++, Qt, KDE-libs • GNU/Linux • Component approach to system architecture • Likes to follow standard • Open-source • A problem: will to improve the library vs. retaining compatibility
Existing research • KDE compatibility Guide • Nokia’s Qt practical experience summary • Random messages on boards and ml-s • Truly outdated references Mostly in form “Thou shalt not” do anything.
Our research Library spec-s Set of Library changes Set of programs
Aims of the research • Formally study C++’s compatibility issues • Model for build process • Standards • Itanium C++ ABI, rev. 1.86 • ISO/IEC 14882:2003 Programming languages — C++ • GCC compiler itself • Constrain the set of possible library changes for its different specifications
Source and binary compatibility p — program; L — library v.1; L* — library v.2 such that defined p.exe = p.compile(L.h).link(L.so) p.compile(L.h).link(L.so).dynlink(L.so) p.compile(L*.h).link(L*.so).dynlink(L*.so) p.compile(L.h).link(L.so).dynlink(L*.so) Source compatibility Binary compatibility
Modified L.h listing namespace lib_L{ #include <L.h> } using namespace lib_L; Modified L*.h listing namespace lib_L{ #include <L.h> } namespace lib_L*{ #include <L*.h> } using namespace lib_L*; Symbol versioning Code duplicaton — 25 minor releases of Qt 4 so far, 50 Mb each. A nightmare.
Dynamic linker errors Declaration → symbol(s) “Name mangling” Keep symbols, that can be referenced. → void X(int, float) _Z8Functionif → void X(int, int) _Z8Functionii _ZN5KlassC1Ev _ZN5KlassC2Ev _ZN5KlassD0Ev _ZN5KlassD1Ev _ZN5KlassD2Ev _ZTI5Klass _ZTS5Klass _ZTV5Klass class Klass{ Klass(); virtual ~Klass(); }; →
Declarations that emit symbols • Non-inline non-template functions • Non-inline explicit template instantiations • Non-template/explicitly instantiated dynamic classes (vtable(s), VTT) Problem arises only when function is called from the user-space. → void X(int, float) _Z8Functionif → inline void X(int, int) Ø
Name mangling caveats typedef float Type; void X(int, Type); → _Z8Functionif typedef int Type; void X(int, Type); _Z8Functionii → class Klass{ … }; void X(Klass*); → _Z1XP5Klass → void X(int); void X(int, int = 0); _Z1Xi → _Z1Xii template<class Param, class Arg> Param* X(); Klass* X<Klass,Type>(){…}; → _Z1XI5KlassiEPT_v
Implementation change • Same results on the domain of L’s version • Bugfixes break compatibility! Output Input L L*
Compiled code notions Two ways of “misunderstanding”: • Program p expects L’s layout, but gets L*’s • Implementation of L* expects p to supply data as in L*’s layout, but gets L’s Klass* ptr=malloc(sizeof_Klass); Klass_Ctor_1(ptr) → Klass* ptr=new Klass(); class MyClass : public Klass { … }; MyClass* ptr= malloc(sizeof_MyKlass); Klass_Ctor_2(ptr) →
What changes size/layout? • Change of the order of non-empty classes in hierarchy • Sometimes – change of the order of zero-sized classes • Adding a member to all but the class, whose member will be the last, or derived from it Use d_ptr and do not change hierarchy. Primary base Next base Virtual base that is primary for no one Virtual base that is primary for no one
Virtual tables vcall & vbase offsets RTTI & offset • Pointer to Run-Time Type Information (RTTI) • Offset-to-top from subobject the sub-vtable corresponds to • Function pointers • “Grows” around point of origin • Used in certain virtual function calls 0 Primary base’s virtual function ptrs vcall & vbase offsets RTTI & offset Next class’ vfunc ptrs (with its primary bases) Virtual base’s vtable that is primary for no one Virtual base’s vtable that is primary for no one Class* ptr = . . .; ptr -> virtual_func();
Vtable properties vcall & vbase offsets RTTI & offset • Emitted for all dynamic classes • Doesn’t reference other • It is constructor who fills sub-vtable pointers • Offset to particular function is compiled into caller • In the virtual function call any sub-vtable may be used Primary base’s virtual function ptrs vcall & vbase offsets RTTI & offset Next class’ vfunc ptrs (with its primary bases) Virtual base’s vtable that is primary for no one Virtual base’s vtable that is primary for no one
Arbitrary vtable change vcall & vbase offsets RTTI & offset • Force user to derive class (this will emit a copy of vtable in the application) • Do not call virtual functions through pointer to base class, esp. in L*’s code (or can we? Tricky wrapper?) • The calls via pointer to derived class will use the copy, not L*’s • Keep an eye on thunks • You still can’t change memory layout Primary base’s virtual function ptrs vcall & vbase offsets RTTI & offset Next class’ vfunc ptrs (with its primary bases) Virtual base’s vtable that is primary for no one Virtual base’s vtable that is primary for no one
Results • The most complete (and, thereby the least useful) compatibility guide • Restrictions upon library usage do not provide much relaxation • Non-derivable classes without ctors and new() • Classes with requirement to be derived in p. • C++ ABI needs redesign (Qt actually made some effort) to be more compatible
Further research • New ABI? • Shim layer for virtual functions (thinner than Qt’s MOC)? • Try to build a useful “preprocessor” to implement vtable trick?
Thank you:-) Pavel Shved <shved@ispras.ru> Denis Silakov <uragan@ispras.ru>