60 likes | 190 Views
Flat AOD. Gheata ALICE weekly meeting 2 Dec 2013. Motivation. Important I/O for analysis jobs, decreasing job efficiency Big file size -> storage problems Format highly dependent on AliRoot types Other experiments are using ntuple -like data in end-user analysis
E N D
Flat AOD Gheata ALICE weekly meeting 2 Dec 2013
Motivation • Important I/O for analysis jobs, decreasing job efficiency • Big file size -> storage problems • Format highly dependent on AliRoot types • Other experiments are using ntuple-like data in end-user analysis • Investigations and tests at small scale already hint for improvements • Disabling non-needed branches -> x5 speed • Tests on analysis specific data skimming -> huge factors to gain • Discussed flattening/reducing AOD format • Useful for improving I/O performance • Useful for Run2/Run3 • Useful for data preservation (smaller, simpler, easier to share) • Useful for different levels of parallelism in analysis
Exercise • Use AODtree->MakeClass() to generate a skeleton, then rework • Keep all AOD info, but restructure the format Int_tfTracks.fDetPid.fTRDncls -> Int_t *fTracks_fDetPid_fTRDncls; //[ntracks_] • More complex cases to support I/O: typedefstruct { Double32_t x[10]; } vec10dbl32; Double32_tfTracks.fPID[10] -> vec10dbl32 *fTracks_fPID; //[ntracks_] Double32_t *fV0s.fPx //[fNprongs] -> TArrayF *fV0s.fPx; //[nv0s_] • Convert AliAODEvent-> FlatEvent • Try to keep the full content AND size • Write FlatEvent on file • Compare file size and read speed
Results • Tested on AOD PbPb: AliAOD.root ****************************************************************************** *Tree :aodTree : AliAOD tree * *Entries : 2327 : Total = 2761263710 bytes File Size = 660491257 * * : : Tree compression factor = 4.18 * ****************************************************************************** ****************************************************************************** *Tree :AliAODFlat: Flattened AliAODEvent * *Entries : 2327 : Total = 2248164303 bytes File Size = 385263726 * * : : Tree compression factor = 5.84 * ****************************************************************************** • Data smaller (no TObject overhead, TRef->int) • 30% better compression • Reading speed • Old: CPU time= 103s , Real time=120s • New: CPU time= 54s , Real time= 64s
Implications • User analysis more simple, working mostly with basic types (besides the event) • Simplified access to data, highly reducing number of (virtual) calls • ROOT-only analysis • No problem with schema evolution, supported as before • Filtering hierarchical to flat AOD’s straightforward • Much better vectorizable track loops • Deltas would have to be also flattened
Conclusions • Flattening the event structure has benefits • Sizeable gains in size and speed (>50%) • Many benefits for user code performance • Conversion straightforward • Does not rule out further skimming of data content per analysis type • An approach we will have to consider seriously for Run2 and specially Run3