210 likes | 229 Views
ROOT Team Meeting CERN Leandro Franco (Joint work with Diego Marcos) 18-06-07. CInt Function Stub Removal. Modifying CInt?. A.K.A : The Pi ñata paradigm. CInt. Experienced Programmers. Newbie. Goal: Obtain the candies from the pi ñata... without breaking anybody's head. Simple Idea.
E N D
ROOT Team Meeting CERN Leandro Franco (Joint work with Diego Marcos) 18-06-07 CInt Function Stub Removal
Modifying CInt? ... A.K.A : The Piñata paradigm CInt Experienced Programmers Newbie Goal: Obtain the candies from the piñata... without breaking anybody's head.
Simple Idea • The dictionaries are big: around 52% of the total library size. • Why don't we just wipe them off from the face of earth? • Short answer: we can't do it yet, but we will try. • Long answer: the whole topic of these slides ;)
First steps • One good way to shrink the dictionaries is to remove the stub functions. • Such functions come from the need of having a generic way to call a function in Cint and from the impossibility of doing a proper name mangling to find such function (i.e. Cint must behave as a compiler but doesn't have the means to do so).
Stub Functions • To be able to solve the name mangling problem a traditional approach was taken: “Any problem in computer science can be solved with another layer of indirection” Wheeler's Law
Stub Functions Library function Compiler/CInt mangling compiling time CInt Dictionary Library function pseudo mangling running time mangling compiling time The dictionary could be seen as a bijective function that maps c++ function declarations to a certain string (string which will be associated to the symbol by the compiler)
Stub Functions • The idea is to avoid that layer of indirection. • We still don't how to do the mangling. • But we know how to do the demangling (or at least, we know who to call to do it ;) ). Go from set Y to set X for all y in Y Instead of going from set X to set Y for a given x in X function header (X) library (Y) function header (X) library (Y) A::A() A::A() _ZN1AC1Ev _ZN1AC1Ev A::HiA() A::HiA() _ZN1A3HiAEv _ZN1A3HiAEv
Stub Functions • This approach writes in stone the biggest side effect: • We will need to demangle ALL the symbols in a library just to be able to call 1 function. • The demangling process might not be too expensive but what happens when we have thousands and thousands of symbols in a library?
Efficiency • Since we have to demangle all the symbols from the library at least once we could cache this result • Expensive approach: libCore has 21000 symbols with an average length of 46 characters when demangled (i.e 614 KB in cache). • Try to demangle as less as possible. Don't do it more than once or twice and don't even try it if the symbols have been registered. • I'm not even mentioning the parsing needed between the demangling and the registering.
Are we winning the fight? • CVS version of ROOT • Libs size: 74.67 MB • Objects size (dictionaries): 47.71 MB • Source size (dictionaries): 50.37 MB • Current status of pre-experimental version • Libs size: 65.46 MB ( -9.21 MB, 12%) • Objects size (dict): 36.42 MB (-11.29 MB, 24%) • Source size (dicti): 37.25 MB (-13.12 MB, 26%)
In all war sacrifices must be made: space and time overhead Let's start with a “normal” sesion Real time: 0.37 s Real time: 21.72 s Rootmarks: 341.97
First Algorithm: be stupid. Initial attempt: demangle all the symbols in a library for every used class Real time: 0.76 s Real time: 38.95 s Rootmarks: 184 Spikes due to the silliness of the algorithm. First demangle everything and the register it.
Second Algorithm: don't be so stupid At least remember the classes that have already been registered Real time: 0.77 s Real time: 37.29 s Rootmarks: 183.68 Spikes due to the silliness of the algorithm. First demangle everything and the register it.
Third Algorithm: use the RAM Demangle the symbols once and keep them in a cache Real time: 0.69 s Real time: 28.48 s Rootmarks: 200.95
Fourth Algorithm: Axel's idea Keep a pointer to the mangled name and demangle twice (when needed) Real time: 0.68 s Real time: 26.97 s Rootmarks: 205.16
Fifth Algorithm: some tuning A bit of optimization with the structures Real time: 0.56 s Real time: 26.51 s Rootmarks: 200.1
Algorithms Comparison How much are we willing to pay for this feature??? Demangling takes 15% of the time at startup (100ms). Which means there is still some room for improvement.
Problems so far... a plethora • Easy ones • ellipsis • parameters by default • free standing functions • weird types like va_list • many more... • Not so easy: • virtual functions... a real pain in the neck • constructors, destructors (in-charge, deleting, etc) • inline functions • non-member operators • ...
Work to be done • Certain stub functions are not out of the dictionary yet: • Constructors and destructors (Diego is working on it) • Non-member operators • Certain cases for std templates • Without stubs we can also take the setup_memfunc calls out of the dictionary. • What else can we take out? • Shadow classes? Show members? Streamers? • Class Inheritance info? typedef? data members info? ...?
Future is always bright (dict source) • CVS Version: 50.37MB • Actual status: 37.25MB (-13.12 MB, 26.0%) • No cons, dests: 30.09MB (-20.28 MB, 40.2%) • Should be there soon enough. • No memfuncs: 17.40MB (-32.97 MB, 65.4%) • We still need the info (in a root file for instance). • No memvars: 14.72MB (-35.65 MB, 70.7%) • No inline issue: 13.89MB (-36.47 MB, 72.4%)
Conclusions • We have gained a better understanding of C++. • As my mother used to say: • He who knows not the way, walks with desperation. (fortunately, we finally have an idea of what we are doing and where we want to go) • A lot of tuning is being done to bring times and memory down to something acceptable. • We need a considerable amount of time to deal with a myriad of small (and not so small) issues.