220 likes | 346 Views
Performance Improvements with ATLAS AOD files. Rene Brun 3 November 2009. Main Points. Typical problems with Trees Branch buffers not clustered by entry Forward/backward seeks when reading Too many network transactions Expensive object model ( cpu time) Solutions TTreeCache
E N D
Performance Improvementswith ATLAS AOD files Rene Brun 3 November 2009
Main Points • Typical problems with Trees • Branch buffers not clustered by entry • Forward/backward seeks when reading • Too many network transactions • Expensive object model (cpu time) • Solutions • TTreeCache • Readaheadbuffer • Reclustering online or a-posteriori with TTree::OptimizeBaskets • Cheaper object model • Monitoring with TTreePerfStats
See Doctor Too many reads Small blocks
Use TTreePerfStats void taodr(Int_tcachesize=10000000) { gSystem->Load("aod/aod"); //shared lib generated with TFile::MakeProject TFile *f = TFile::Open("AOD.067184.big.pool.root"); TTree *T = (TTree*)f->Get("CollectionTree"); Long64_t nentries = T->GetEntries(); T->SetCacheSize(cachesize); if (cachesize > 0) { T->SetCacheEntryRange(0,nentries); T->AddBranchToCache("*",kTRUE); } TTreePerfStatsps("ioperf",T); for (Long64_t i=0;i<nentries;i++) { T->GetEntry(i); } ps.SaveAs("aodperf.root"); ps.Draw(); ps.Print(); } Root >TFilef(“aodperf.root”) Root >ioperf.Draw()
Test conditions • Because both the TreeCacheand Readaheadare designed to minimize the difference RealTime-CpuTime, care has been taken to run the tests with “cold” files, making sure that system buffers were dropped before running a new test. • Note that increasing the TreeCache size reduces also the CpuTime. • Note that running OptimizeBasketsalso reduces substantially the CpuTime because the number of baskets is in general reduced by several factors.
Test conditions 2 • Using one of the AOD files the class headers have been generated automatically via TTree::MakeProject. • The corresponding shared library is linked such that the same object model is used in my tests and in Atlas persistent model. • The tests read systematically all entries in all branches. Separate tests have been run to check that the optimal performance is still obtained when reading either a subset of branches, a subset of entries or both. This is an important remark because we have seen that sometimes proposed solutions are good when reading everything and very bad in the other mentioned use cases that are typical of the physics analysis scenarios.
What is the TreeCache • It groups into one buffer all blocks from the used branches. • The blocks are sorted in ascending order and consecutive blocks merged such that the file is read sequentially. • It reduces typically by a factor 1000 the number of transactions with the disk and in particular the network with servers like xrootdor dCache. • The small blocks in the buffer can be unzipped in parallel on a multi-core machine. • The typical size of the TreeCacheis 10 Mbytes, but higher values will always give better results. If you have no memory problem, set large values like 200 Mbytes.
TreeCache size impact 10 0 200 30
File with 203 branches and split=0 TreeCache is an overhead In case of a local disk It is essential with xrootd
TreeCache results table Original Atlas file (1266MB), 9705 branches split=99 Reclust: OptimizeBaskets 30 MB (1147 MB), 203 branches split=0 Reclust: OptimizeBaskets 30 MB (1086 MB), 9705 branches split=99
What is the readahead cache • The readaheadcache will read all non consecutive blocks that are in the range of the cache. • It minimizes the number of disk accesses. This operation could in principle be done by the OS, but the fact is that the OS parameters are not tuned for many small reads, in particular when many jobs read concurrently from the same disk. • When using large values for the TreeCacheor when the baskets are well sorted by entry, the readaheadcache is not necessary. • Typical (default value) is 256 Kbytes, although 2 Mbytes seems to give better results on Atlas files, but not with CMS or Alice. • The readaheadcache should not be used in several use cases (see 2 examples later)
Readaheadreading all branches, all entries Read ahead excellent
Reading only 2 branchesout of 9705 Read ahead very bad
Reading all branches in 1% random entries Read ahead very bad TreeCache is bad If it is not used with a TEntryList
comments • It is not because we get this pattern that the TreeCacheor readaheadshould be used in all cases. • The control must be on the application side. • Ideally one should be able to activate/deactivate the readahead automatically (working on this) • Hints could be generated automatically following the results collected by TTreePerfStats(requires more work)
Comments 2 • These tests have been done with files on a local disk. • In case of client-server mode with xrootd, dCacheor httpd, the TreeCacheis vital. • Using the TreeCache, ROOT can read efficiently files on WANs with very high latencies. • The readahead algorithm is currently implemented only for TFile. It could be implemented in the xrootd and dCache servers too.
Comments 3 • I have discussed only techniques improving the RealTime. Don’t forget other optimizations improving the CpuTime • Minimize inheritance levels with split=0 • Do not abuse of std::stringor similar small objects that contribute to the memory fragmentation. • Use std::vector<T*> in situations where the Ts derive from a common class. This is better than increasing the number of branches. • TClonesArrayis still the most performing collection of identical objects (see test program bench)
Comments 4 • ROOT version 5.25/02 includes a new and simpler API to the TreeCache. • It also includes the new readahead algorithm. • Xrootd in this version contains several new developments and optimisations. • These new features CANNOT be backportedto 5.24 or worst to 5.22. • 5.25/02 and the coming 5.26 are back compatible with 5.22. We expect collaborations to move to this new version.
Comments 5 • I am convinced that OptimizeBaskets, TreeCacheand Readaheadalgorithms can be further improved. • We need the cooperation of the experiments, testing many more use cases and giving feedback (eg sending the results of TTreePerfStatsfile. • Do not keep the results of your test under the carpet. Our priority is to help the experiments improving the situation .
Situation with CMSmail from Brian Bocklelmann (today) - OptimizeBasketsdid see an improvement in reads of the resulting file. The strange "tails" that you see in the plots I sent out become less pronounced (although definitely do not disappear) - Paul and Philippe are now convinced that there are no re-reads in the TTreeCache. - TTreeCachewith 20MB seems to be sufficiently large; covers about 2500 events - Readaheadimplemented in ReadBuffersis effective; a smaller size of 64KB provides a marked speed increase over 256KB readahead. Depending on the bandwidth of the data channel, the over-read percentage in the 256KB readahead case can become significant One thing that I've learned tonight is that the readv interface in dCacheis not well-implemented (on the server side, it's a for-loop over calls to read) and that you can get speed increases by removing TDcacheFile'simplementation of ReadBuffers. I believe that, after 2 days of working with Philippe, the current set of recommendations is: 1) Apply the remaining fix to make TTreeCachework in CMSSW, enable it wherever possible, and set it to around 20-30MB 2) Utilize ROOT's implementation of ReadBuffers 3) When we migrate to ROOT 5.26, enable calls to OptimizeBasketswhen writing out initial files (as it has no effect for fast merging). 4) Remove TDcacheFile'simplementation of ReadBuffers*or* implement the optimization equivalent to ROOT's in the dCacheserver code. I believe 1, 2, and 4 are relatively easy.