200 likes | 222 Views
A couple of slides on containers…. Federico Carminati Offline Week 10-05. Generalities. Purpose of a container is to hold several instances of similar information Elements in a container are accessed via an index, an iterator or both Three kind of containers will be considered
E N D
A couple of slides on containers… Federico Carminati Offline Week 10-05
Generalities • Purpose of a container is to hold several instances of similar information • Elements in a container are accessed via an index, an iterator or both • Three kind of containers will be considered • “C-style” arrays • ROOT containers • STL containers
C-style containers #include <stdio.h> void cont1 () { struct point { Float_t x; Float_t y; }; point P[100]; printf("Sizeof P = %ld\n",sizeof(P)); } root [5] .x cont1.C++ Sizeof P = 800
C-style containers • Advantages • Minimum size overhead • Fast access (direct and sequential) • Very clear semantics • Drawbacks • Lack of “encapsulation” • Minimal functionality (I/O, browsing…) • Fixed dimension • No safety against out-of-bounds addressing • Where to use • Data structures within algorithms • Where to avoid • For dynamic data structures • When I/O and inspection are required • For publicly accessible data (i.e. outside a single method)
Array of classes #include <stdio.h> #include <TObject.h> void cont1 () class Cpoint : public TObject { public: Cpoint() {} ~Cpoint() {} Float_t X() const {return fX;} Float_t Y() const {return fY;} private: Float_t fX; Float_t fY; }; Cpoint CP[100]; printf("Sizeof CP = %ld\n",sizeof(CP)); } root [7] .x cont1.C++ Sizeof CP = 2000
Array of classes • Advantages • Fast access (direct and sequential) • Very clear semantics • Full “ROOT” functionality (I/O, browsing) • C++ “encapsulation” • Drawbacks • 12 bytes overhead per object • Fixed dimension • No safety against out-of-bounds addressing • Where to use • Data structures within algorithms • Class members with fixed dimensions • Where to avoid • For dynamic data structures • When objects are small
Classes of arrays #include <stdio.h> #include <TObject.h> void cont1 () { class CApoint : public TObject { public: CApoint() {} ~CApoint() {} Float_t X(Int_t i) const {return fX[i];} Float_t Y(Int_t i) const {return fY[i];} private: Float_t fX[100]; Float_t fY[100]; }; CApoint CAP; printf("Sizeof CAP = %ld\n",sizeof(CAP)); } root [20] .x cont1.C++ Sizeof CBP = 812
Classes of arrays #include <stdio.h> #include <TObject.h> void cont1 () { class Cpoint : public TObject { public: Cpoint() {} ~Cpoint() {} Float_t X() const {return fX;} Float_t Y() const {return fY;} void Set(Float_t x, Float_t y) {fX=x; fY=y;} private: Float_t fX; Float_t fY; }; class CBpoint : public TObject { public: CBpoint() {} ~CBpoint() {} void GetPoint(Cpoint &p, Int_t i) const {p.Set(fX[i], fY[i]);} private: Float_t fX[100]; Float_t fY[100]; }; CBpoint CBP; printf("Sizeof CBP = %ld\n",sizeof(CBP)); }
Classes of arrays • Advantages • Fast access (direct and sequential) • Very clear semantics • Full “ROOT” functionality (I/O, browsing) • C++ “encapsulation” • Possibility to add your own memory management • Low overhead (12 bytes for the whole array!) • Drawbacks • “Roll-your-own” management of dynamic dimensions • No safety against out-of-bounds addressing • Where to use • Class members with fixed dimensions • Where to avoid • For highly dynamic data structures
ROOT containers I - TObjArray #include <stdio.h> #include <TObject.h> #include <TObjArray.h> void cont2 () { class Cpoint : public TObject { public: Cpoint() {} ~Cpoint() {} Float_t X() const {return fX;} Float_t Y() const {return fY;} void Set(Float_t x, Float_t y) {fX=x; fY=y;} private: Float_t fX; Float_t fY; }; TObjArray CP(100); for (Int_t i=0; i<100; ++i) CP[i]=new Cpoint(); }
ROOT containers I - TObjArray • Advantages • Fast direct access, sequential may be slower • Polymorphic container • Full “ROOT” functionality (I/O, browsing) • C++ “encapsulation” • Fully automated dynamic management • Overhead is 40+<n>*4 bytes • Drawbacks • Have to use TObjects, with their own overhead • Object creation is expensive • Object ownership has to be handled carefully to avoid leaks • Where to use • Dynamic data structures with direct access • Need for polymorphism • Where to avoid • Where the above conditions are not verified • When you need to recreate objects frequently
ROOT containers II - TClonesArray #include <stdio.h> #include <TObjArray.h> #include <TClonesArray.h> #include <TStopwatch.h> void cont3 (Int_t nrep) { class Cpoint : public TObject { public: Cpoint() {} ~Cpoint() {} Float_t X() const {return fX;} Float_t Y() const {return fY;} void Set(Float_t x, Float_t y) {fX=x; fY=y;} private: Float_t fX; Float_t fY; }; const Int_t size=20000; TStopwatch t; t.Start(); TObjArray a(size); for(Int_t i=0; i<nrep; ++i) { for(Int_t j=0; j<size; ++j) a[j]=new Cpoint(); a.Delete(); } t.Print(); t.Reset(); t.Start(); TClonesArray b("Cpoint", size); for(Int_t i=0; i<nrep; ++i) { for(Int_t j=0; j<size; ++j) new(b[j]) Cpoint(); b.Clear(); } t.Print(); } root [23] .x cont3.C++(1000) Real time 0:01:11, CP time 62.600 Real time 0:00:22, CP time 20.230
ROOT containers II - TClonesArray • Advantages • Fast direct and sequential access • Polymorphic container • Full “ROOT” functionality (I/O, browsing) • C++ “encapsulation” • Fully automated dynamic management • Overhead is 48+<n>*8 bytes • Very cheap object creation • Drawbacks • Have to use TObjects, with their overhead • Array owns the objects • Where to use • Dynamic data structures with direct access which are recreated several times • Need for polymorphism • Where to avoid • Where the above conditions are not verified • When you do not need to recreate objects frequently
Trees • Trees are not containers • Trees simulate containers for collections of similar objects written on a file • When the collection is small, it is convenient to read it all in memory • When it is large, Trees give you the “look and feel” of a container in memory with a sophisticated “behind your back” management of I/O • Trees have a very nice “player” interface that you do not have for normal containers • Unless you implement it!
Maps #include <map> #include <iostream> struct ltstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) < 0;} }; void cont4() { map<const char*, int, ltstr> months; char *mname[12]={"january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december"}; int days[12]={31,28,31,30,31,30,31,31,30,31,30,31}; for (int i=0; i<12; i++) months[mname[i]]=days[i]; cout << "june -> " << months["june"] << endl; map<const char*, int, ltstr>::iterator cur = months.find("june"); map<const char*, int, ltstr>::iterator prev = cur; map<const char*, int, ltstr>::iterator next = cur; ++next; --prev; cout << "Previous (in alphabetical order) is " << (*prev).first << endl; cout << "Next (in alphabetical order) is " << (*next).first << endl; } june -> 30 Previous (in alphabetical order) is july Next (in alphabetical order) is march
Maps • Map is a Sorted Associative Container that associates objects of type Key with objects of type Data • Map is a Pair Associative Container, meaning that its value type is pair<const Key, Data> • It is also a Unique Associative Container, meaning that no two elements have the same key • Map has the important property that inserting a new element into a map does not invalidate iterators that point to existing elements • Erasing an element from a map also does not invalidate any iterators, except, of course, for iterators that actually point to the element that is being erased
Maps • Advantages • Fast direct direct access and sequential access (but no indexing) • Supported by ROOT • Fully automated dynamic management (see before) • Drawbacks • Large overhead (I could not calculate it EXACTLY, but it includes a hash table) • Using “AliRoot-forbidden STL’s” • Where to use • Need to access quickly data with non-integer keys • Where to avoid • Where you do NOT desperately need the above • Where you can use TMap • For integer keys the overhead of producing a hash table is massive and unjustified -- you are using a bazooka to kill a fly!
… and if I had more time … • I would have told you about all the rest • … but
Conclusion • It might be tempting to use the “most functional” container to do the job • Functionality comes at a cost • AliRoot is already too slow and too big to afford this • So please use a judicious blend of brain and the simplest collection that does the job • Don’t delude yourself with 10-lines benchmarks they can be tuned to provide any result with a bit of skill