Some ideas for possible future developments

Some ideas for possiblefuture developments LCG Applications area meeting 8 March 2006 René Brun CERN

Motivation • Not the new ROOT logo ! • I am not reading in the crystal ball. • Simplify users life Ideas for future developments

ROOT person power CERN + FNAL + JINR Only people working full time on the project Ideas for future developments

Observations (1) • ROOT, like experiment software, is becoming bigger and bigger. • Users have to run on many different machines. • Applications become more and more distributed. • At LHC, ratio developers/users change They require Improved UI More robustness or anything simplifying their life users developers Ideas for future developments

Observations (2) • A considerable amount of time is spent in installing software (up to one day for an expert). • Porting to a new platform is non trivial. • Dependency problems in case many packages must be installed. • Only a small subset of the software is used. • The installation may require a huge amount of disk space. Users are scared to download a new version. • This is not fitting well with the GRID concept. • The GRID should be used to simplify this process and not to make it more complex. Ideas for future developments

LHC software Ideas for future developments

Problem with dictionaries • Today cint/reflex dictionaries are machine dependent. • They represent a very substantial fraction of the total code. • We could make a very large fraction machine independent. • Interface to functions could be reduced with standard ABIs. • Dict data structures could be saved to a root file instead of generating the code producing these ds. • In this case, one will import only the ds for the classes really used (I/O or interpreter) Ideas for future developments

ROOT source, bins, dict,libs *.h 153 kl 6.4 Mb SLC3/gcc3.2.3 Windows/vc++7.1 rootcint –cint 56s, 71s rootcint –reflex 58s, 71s rootcint –gccxml 300s, 100s *.cxx 855 kl 100 Mb Xdict_c.cxx 704 kl Xdict_r.cxx 623 kl Xdict_g.cxx 623kl c++ 338s, 90s c++ 420s, 417s c++ 427s, 421s c++ 2640s, 1614s *.o 41 Mb, 114 Mb Xdict_c.o 44 Mb, 53 Mb Xdict_r.o 51Mb, 65 Mb Xdict_g.o 51Mb, 65 Mb ld 15s, 45s *.so, .lib 88 Mb, 71 Mb Ideas for future developments

Ideas for future developments

Problem with STL Inlining • STL containers are very nice. However they have a very high cost in a real large environment. • Compiling code with STL is much much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc4), but not much. • Object modules are bigger • Compiler or linker is able to eliminate duplicate code in ONE object file or shared lib, not across libraries. • If you have 100 shared libs, it is likely that you have the code for std:vector push_back or iterators 100 times! • In-lining is nice if used with care . It may have an opposite effect, generating more cache misses in a real application. • Templates are statically defined and difficult to use in an dynamic interactive environment. Ideas for future developments

Source of inefficiencies with Shared Libs • fPIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%) • In case of many shared libs, the percentage of classes and code used is small =>swapping (20%) • Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. The same optimization level is used everywhere • But a very large fraction of the code does not need to be optimized: no gain at execution, big loss when compiling • A small fraction of the code should be compiled with the highest possible optimization (10%) • May be a factor 2 loss !!! Ideas for future developments

Shared Libs vs Archive Libs • In the Fortran era, often one subroutine/file • Loader takes only the subroutines really referenced. However the percentage of referenced but not used code has increased with time. • Shared libs were efficient at a time when code could be shared between different tasks on time sharing systems. • Shared libs have solved partially the link time problem. • Shared libs are not a solution for the long term. • Archive libs are unusable in a large system, but nice to build static modules • What to do ? Ideas for future developments

Shared lib size in bytes Fraction of ROOT code really used in a batch job Ideas for future developments

Fraction of ROOT code really used in a job with graphics Ideas for future developments

Fraction of code really used in one program %functions used %classes used Ideas for future developments

Can we gain with a better packaging? • Yes and no • One shared lib per class implies more administration, more dictionaries, more dependencies. • 80 shared libs for ROOT is already a lot. 500 would be non sense • Plug-in Manager helps Ideas for future developments

Proposal for a new scenario Introducing BOOT A Software Bootstrap system Ideas for future developments

R O O T BOOT What is BOOT? • A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment). • It is a very small subset of ROOT (5 to 10 per cent) • The same idea could be extended to other domains, like simulation and reconstruction. Ideas for future developments

What is BOOT? • A small, easy to install, standalone executable module ( < 5 Mbytes) • One click in the web browser • It must be a stable system that can cope with old and new versions of other packages including ROOT itself. • It will include: • A subset of ROOT I/O, network and Core classes • A subset of Reflex • A subset of CINT (could also have a python flavor) • Possibly a GUI object browser • From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way. Ideas for future developments

BOOT and existing applications • BOOT must be able to run with the existing codes, may be with reduced possibilities. • In the next slides, a few use cases to illustrate the ideas. • Do not take the syntax as a final word. Ideas for future developments

R O O T BOOT BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx.yyy.zzz • Nothing else on the machine , except the compiler (no ROOT, etc) • Import a ROOT file containing histograms, Trees and other classes (usecase1.root) • Browse contents of file • Draw an histogram Ideas for future developments

Use Case 1 http://root.cern.ch/source.root This is a compressed ROOT file containing the full ROOT source tree automatically built from CVS (25 Mbytes) + ROOT classes dictionary DS generated by Reflex (5 Mbytes) + The full classes documentation Objects generated by the source parser (5 Mbytes) Usecase1.root (2 Mbytes) Contains references (URL) to classes in namespace ROOT Local cache with the source of the classes really used + binaries for the classes or functions that are automatically generated from the interpreter (like ACLIC mechanism) user@xxx.yyy.zzz pcroot@cern.ch Ideas for future developments

Use Case 1 pictures http://root.cern.ch/source.root usecase1.root Ideas for future developments

h.Draw() local mode Thanks to system.rootmap, we know the libs used by any class CINT libX11 ------- … drawline drawtext … libCore ------- … I/O TSystem … libGpad ------- … TPad TFrame … pm pm (Plug-in Manager) pm libGraf ------- … TGraph TGaxis TPave … libHist ------- … TH1 TH2 … libHistPainter ------- … THistPainter TPainter3DAlgorithms … pm pm Ideas for future developments

Faster ACLIC memory We are waisting a lot of time in writing/reading .o or .so files to/from disk Cint 10000 l/s c++ 800 l/s ld myapp *.cxx, *.h 100 Mb *.o 110 Mb *.so 76 Mb Ideas for future developments

Use Case 2 • BOOT already installed • Want to write the shortest possible program using some classes in namespace ROOT and some classes from another namespace YYYY //This code can be interpreted line by line //executed as a script or compiled with C/C++ //after corresponding code generation use ROOT=http://root.cern.ch/root5.10/source.root use YYYY=http://cms.cern.ch/packages/yyyy h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); h.Draw(); Ideas for future developments

Use Case 3 • A variant of Use Case 2 • A bug has been found in class LorentzVector of ROOT and fixed in new version ROOT6 use ROOT, YYYY=http://cms.cern.ch/packages/yyyy use ROOT6=http://root.cern.ch/root6/code.root use ROOT6::LorentzVector h = new TH1F(“h”,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); Ideas for future developments

Use Case 4: Specialized Code Generators use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root TFile f(“mcrun.root”); for each entry in f.T for each electron in Electrons if(electron.m_Eta > 1) h.Fill(electron.m_Pt); h.Draw • High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files. • PROOF compliant • Extension of TTree::MakeProxy code generator. • Do not read referenced but unused branches. Ideas for future developments

Use Case 5: Dynamic HELP, Dynamic html • Source files and scripts are browsable in html format generated dynamically. • Combination of new version of THtml and the new GUI widget TGHtml. • Both classes use extensively the Reflex dictionary and the pre-digested documentation. Ideas for future developments

Use Case 6: Event Displays • In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand, Crocodile). • This is complex and not good for users and OUTREACH. • A data file with the visualization scripts is far more powerful • This implies that the GUI must be fully scriptable. This is the case for ROOT GUI. Event data in a Tree C++ scripts Ideas for future developments

Requirements: work to do • libCore has already all the infrastructure for client-server communications and for accessing remote files on the GRID. • We must understand how to use subsets of the compilers and linkers to bypass disk I/O. • We must understand how to emulate a dynamic linker using pre-compiled objects in memory. • We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python). • We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary. Ideas for future developments

Summary • Just ideas. • Making a firm proposal requires more investigations and prototyping. • Many of these ideas could be implemented gradually even without BOOT. • It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward. • This work will continue as a background activity. • Your comments are welcome Ideas for future developments

Some ideas for possible future developments