290 likes | 716 Views
Some thoughts for the future (may be). ROOT team meeting 27 January 2006 Ren é Brun CERN. Observations. A considerable amount of time is spent in installing software (up to one day for an expert). Porting to a new platform is non trivial.
E N D
Some thoughtsfor the future(may be) ROOT team meeting 27 January 2006 René Brun CERN Some thoughts
Observations • A considerable amount of time is spent in installing software (up to one day for an expert). • Porting to a new platform is non trivial. • Dependency problems in case many packages must be installed. • Only a small subset of the software is used. • The installation may require a huge amount of disk space. Users are scared to download a new version. • This is not fitting well with the GRID concept. • The GRID should be used to simplify this process and not to make it more complex. Some thoughts
Atlas packages with > 10000 lines 211677 dice fortran=211641 187691 atrecon fortran=138126,cpp=49354 129793 MuonSpectrometer fortran=121321,python=3715,csh=2613,sh=2136 118504 Tools cpp=67337,ansic=19012,python=13770,sh=7373,yacc=5659, fortran=3024,lex=1971 116327 PhysicsAnalysis cpp=107348,python=6070,sh=1649,csh=1260 115143 geant3 fortran=115040,ansic=67 112445 TileCalorimeter cpp=108580,python=2209,csh=920,sh=736 108200 atutil fortran=108000,ansic=164 80866 Applications fortran=71764,cpp=6961,ansic=1865 74721 Calorimeter cpp=65917,python=7854,sh=490,csh=460 67822 atlfast fortran=67786 64838 Tracking cpp=60255,python=2092,csh=1380,sh=1104 59429 Generators fortran=28136,cpp=25538,python=4123,sh=872,csh=760 49926 graphics java=40719,cpp=8312,python=321,sh=255,csh=220 40058 AtlasTest cpp=25159,python=5131,sh=4815,perl=4145,csh=517 39576 Control cpp=22030,python=15904,sh=907,csh=693 31192 DetectorDescription ansic=29540,csh=680,sh=562,python=343 29500 TestBeam cpp=27433,python=1491,csh=320,sh=256 25001 Reconstruction sh=10297,fortran=7559,python=5393,csh=1667 18989 atlsim fortran=17561,cpp=1380 18328 InnerDetector python=11466,csh=2860,sh=2641,ansic=1343 17291 Simulation python=13653,sh=2126,csh=1302,fortran=169 16139 Database perl=8310,sh=4299,java=2209,csh=709,python=566 14250 Event cpp=13522,python=296,csh=240,sh=192 12930 gcalor fortran=12894 11955 Trigger python=7860,csh=1780,sh=1673,perl=634 11195 LArCalorimeter python=6133,ansic=2045,csh=1620,sh=1347 3 million lines of code 1200 packages Some thoughts
Alice packages with > 10000 lines 398742 PDF fortran=398729,ansic=13 146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100 128337 HLT cpp=127601,ansic=605,sh=100,csh=31 128103 ITS cpp=128010,sh=93 105763 MUON cpp=105673,sh=90 94548 DPMJET fortran=94267,cpp=281 72400 STEER cpp=72400 52443 HBTAN cpp=51260,fortran=1183 51489 TPC cpp=51479,sh=10 50932 PHOS cpp=50639,csh=293 46176 TRD cpp=46176 41998 ISAJET fortran=40483,cpp=1494,pascal=21 39407 RALICE cpp=29764,ansic=9355,sh=288 35916 EMCAL cpp=35410,fortran=383,csh=123 31820 ANALYSIS cpp=31820 27751 HERWIG fortran=27246,cpp=477,ansic=28 27025 FMD cpp=27021,sh=4 26667 TOF cpp=26667 24258 EVGEN cpp=24258 21588 HIJING fortran=21099,cpp=489 20562 JETAN cpp=19687,fortran=875 18344 RAW cpp=18344 15232 STRUCT cpp=15232 13142 PMD cpp=13142 12945 RICH cpp=12945 10966 FASTSIM cpp=10966 10944 MONITOR cpp=10944 10659 ZDC cpp=10659 1.5 million lines of code Some thoughts
Fraction of code really used in one program %functions used %classes used Some thoughts
LHC software Some thoughts
ROOT source, bins, dict,libs *.h 153 kl 6.4 Mb SLC3/gcc3.2.3 Windows/vc++7.1 rootcint –cint 56s, 71s rootcint –reflex 58s, 71s rootcint –gccxml 300s, 100s *.cxx 855 kl 100 Mb Xdict_c.cxx 704 kl Xdict_r.cxx 623 kl Xdict_g.cxx 623kl c++ 338s, 90s c++ 420s, 417s c++ 427s, 421s c++ 2640s, 1614s *.o 41 Mb, 114 Mb Xdict_c.o 44 Mb, 53 Mb Xdict_r.o 51Mb, 65 Mb Xdict_g.o 51Mb, 65 Mb ld 15s, 45s *.so, .lib 88 Mb, 71 Mb Some thoughts
Source of inefficiencieswhen compiling • Always compile your dictionaries with –O0. It does not make any difference at execution time if dictionaries are compiled with –O0, -O1 or –O2. • Example, time to compile G__Base1.cxx • -O0 20s • -O1 40s • -O2 60s • Always use local files • Use forward declarations as much as possible. • Abuse of templates/STL is a real killer (see later) Some thoughts
Serious problem with STL • STL containers are nice. However they have a high cost in a real large environment. • Compiling code with STL is much much slower • Object modules are bigger • The compiler or linker is able to eliminate duplicate code in ONE object file or shared lib, not across libraries. • If you have 100 shared libs, it is likely that you have the code for std:vector push_back or iterators 100 times! • Inlining is nice if used with care (or toy benchmarks). It may have an opposite effect, generating more cache misses in a real application. • Templates are statically defined and difficult to use in an dynamic interactive environment. Some thoughts
Source of inefficiencieswith shared libs • fPIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%) • In case of many shared libs, the percentage of classes and code used is small =>swapping (20%) • Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. • The same optimization level is used everywhere • But a very large fraction of the code does not need to be optimized :no gain at execution, big loss when compiling • A small fraction of the code should be compiled with the highest possible optimization (10%) • May be a factor 2 loss !!! Some thoughts
Can we gain something with a better packaging? • Yes and no • 1 shared lib per class implies more administration, more dictionaries, more dependencies. • 80 shared libs for ROOT is already a lot • 500 would be non sense • Plug-in Manager helps Some thoughts
Shared libs vs Archive libs • In the Fortran era, often one subroutine/file • Loader takes only the subroutines really referenced. However the percentage of referenced but not used code has increased with time. • Shared libs were efficient at a time where code could be shared between different tasks on time sharing systems. • Shared libs have solved partially the link time problem. • Shared libs are not a solution for the long term. • Archive libs are unusable in a large system, but nice to build static modules • What to do ? Some thoughts
memory Cint 10000 l/s c++ 800 l/s ld myapp *.cxx, *.h 70 Mb *.o 110 Mb *.so 76 Mb Some thoughts
Proposal for a new scenario Introducing BOOT A Software Bootstrap system Some thoughts
What is BOOT? • A small, easy to install, standalone executable module ( < 5 Mbytes) • One click in the web browser • It must be a stable system that can cope with old and new versions of other packages including ROOT itself. • It includes: • A subset of ROOT I/O, network and Core classes • A subset of Reflex • A subset of CINT (could also have a python flavour) • Possibly a GUI object browser Some thoughts
BOOT and existing applications • BOOT must be able to run with the existing systems, may be with reduced possibilities. • In the next slides, I show a few use cases to illustrate the ideas. • Do not take the syntax as a final word. Some thoughts
BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx.yyy.zzz • Nothing else on the machine except the compiler (no ROOT, etc) • Import a ROOT file containing histograms, Trees and other classes (usecase1.root) • Browse contents of file • Draw an histogram Some thoughts
h.Draw() local mode CINT libX11 ------- … drawline drawtext … libCore ------- … I/O TSystem … libGpad ------- … TPad TFrame … pm pm pm libGraf ------- … TGraph TGaxis TPave … libHist ------- … TH1 TH2 … libHistPainter ------- … THistPainter TPainter3DAlgorithms … pm pm Some thoughts
Use Case 1 Usecase1.root (2 Mbytes) Contains references (URL) to classes in namespace ROOT http://root.cern.ch/coderoot.root This is a compressed ROOT file containing the full ROOT source tree automatically built from CVS (25 Mbytes) + ROOT classes dictionary DS generated by Reflex (5 Mbytes) + The full classes documentation Objects generated by the source parser (5 Mbytes) Local cache with the source of the classes really used + binaries for the classes or functions that are automatically generated from the interpreter (like ACLIC mechanism) user@xxx.yyy.zzz pcroot@cern.ch Some thoughts
Use Case 1 pictures usecase1.root code.root Some thoughts
Use Case 2 • BOOT already installed • Want to write the shortest possible program using some classes in namespace ROOT and some classes from another namespace YYYY //This code can be interpreted line by line //executed as a script or compiled with C/C++ //after corresponding code generation use ROOT, YYYY=http://cms.cern.ch/packages/yyyy h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); h.Draw(); Some thoughts
Use Case 3 • A variant of Use Case 2 • A bug has been found in class LorentzVector of ROOT and fixed in new version ROOT6 use ROOT, YYYY=http://cms.cern.ch/packages/yyyy use ROOT6=http://root.cern.ch/root6/code.root use ROOT6::LorentzVector h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); Some thoughts
Use Case 4 • High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files. use ROOT use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root TFile f(“mcrun.root”); for each entry in f.Tree for each electron in Electrons h.Fill(electron.m_Pt); h.Draw Some thoughts
Use Case 5: Event Displays • In general Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand,Crocodile). • This is complex and not good for users and OUTREACH. • A data file with the visualization scripts is far more powerful • This implies that the GUI must be fully scriptable. This is the case for ROOT GUI. data scripts Some thoughts
Requirements: work to do • libCore has already all the infrastructure for client-server communications and for accessing remote files on the GRID. • We must understand how to use subsets of the compilers and linkers to bypass disk I/O. • We must understand how to emulate a dynamic linker using pre-compiled objects in memory. • We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python). • We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary. Some thoughts
Procedure • These are just ideas. Making a firm proposal requires more investigations and prototyping. • It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward. • It is my intention to continue this work as a background activity. Some thoughts