1 / 69

de ROOT a BOOT

de ROOT a BOOT. Ren é Brun CERN. Plan. Constatations: Nous sommes de gros obèses Quelle ligne voulons nous retrouver? Plans d’amaigrissement. Observations. Un temps considérable est requis par l’installation du logiciel de nos grosses expériences.

lcovington
Download Presentation

de ROOT a BOOT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. de ROOT a BOOT René Brun CERN

  2. Plan • Constatations: Nous sommes de gros obèses • Quelle ligne voulons nous retrouver? • Plans d’amaigrissement De ROOT a BOOT

  3. Observations • Un temps considérable est requis par l’installation du logiciel de nos grosses expériences. • Le portage sur une nouvelle plateforme n’est pas trivial. • Problèmes de dépendance entre librairies. • Une petite fraction du logiciel est effectivement utilisée. • L’installation est coûteuse en temps et espace disque. • Les utilisateurs hésitent avant d’installer une nouvelle version. • Ceci est en contradiction avec le but initial des grilles.. • La GRILLE devrait être utilisée pour simplifier le problème et non pour l’aggraver. De ROOT a BOOT

  4. Observations 2 • Le profil utilisateur change • Les frameworks des expériences évoluent (en principe avec plus de modularité). • C++ est de loin le langage dominant. • L’utilisation des types paramètres (templates) augmente régulièrement. • L’importance des dictionnaires objets est reconnue • Pour les entrées/sorties • Pour les interpréteurs • Pour les GUI (signal & slots) • La taille des dictionnaires devient un problème. • La taille des modules exécutables est problématique. De ROOT a BOOT

  5. Quelques paramètres pour les logiciels LHC De ROOT a BOOT

  6. Experiment FrameworksStarting point Monolithic simulation Monolithic reconstruction user Simulation toolkit Analysis toolkit PAW or ROOT used like PAW De ROOT a BOOT

  7. Experiment FrameworksEnd point Simulation toolkit User Loads only what he needs Simulation & Reconstruction libraries hierarchy Core framework with plug-in manager persistency, dictionary, folders, graphics, GUI and general utilities De ROOT a BOOT

  8. Changement du profil utilisateur • Le rapport du nombre développeurs/utilisateurs change rapidement dans le cas du LHC. • Les applications deviennent de plus en plus distribuées. • Les OS et machines évoluent rapidement. They require Improved UI More robustness or anything simplifying their life users developers De ROOT a BOOT

  9. Program Size (RAM) ? De ROOT a BOOT

  10. Program Size (lines of code) MS Windows Public Libraries Experiment Code base One user De ROOT a BOOT

  11. Time to compile ADA F77/90 C++ C De ROOT a BOOT

  12. Problem with STL Inlining • STL containers are very nice. However they have a very high cost in a real large environment. • Compiling code with STL is much much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc4), but not much. • Object modules are bigger • Compiler or linker is able to eliminate duplicated code in ONE object file or shared lib, not across libraries. • If you have 100 shared libs, it is likely that you have the code for std:vector push_back or iterators 100 times! • In-lining is nice if used with care (or toy benchmarks). It may have an opposite effect, generating more cache misses in a real application. • Templates are statically defined and difficult to use in an dynamic interactive environment. De ROOT a BOOT

  13. Example with include <string> • This includes more than 20000 lines of C++ code!!! • <string>, and also <vector>, <list> is used by nearly every C++ file in Atlas and CMS • On many systems (eg Solaris/CC) <string> includes many other includes, in turn including other includes!! /opt/SUNWspro/WS6U1/include/CC/std/stdio.h /usr/include/sys/feature_tests.h /usr/include/sys/isa_defs.h /usr/include/stdio.h /usr/include/iso/stdio_iso.h /usr/include/sys/feature_tests.h /usr/include/sys/va_list.h /usr/include/stdio_tag.h /usr/include/stdio_impl.h /usr/include/sys/isa_defs.h /opt/SUNWspro/WS6U1/include/CC/std/string.h /usr/include/sys/feature_tests.h /usr/include/string.h /usr/include/iso/string_iso.h //usr/include/sys/feature_tests.h /opt/SUNWspro/WS6U1/include/CC/std/ctype.h /usr/include/sys/feature_tests.h /usr/include/ctype.h /usr/include/iso/ctype_iso.h /usr/include/sys/feature_tests.h usr/include/sys/types.h /usr/include/sys/isa_defs.h /usr/include/sys/feature_tests.h /usr/include/sys/machtypes.h /usr/include/sys/feature_tests.h /usr/include/sys/int_types.h /usr/include/sys/isa_defs.h /usr/include/fcntl.h /usr/include/sys/feature_tests.h /usr/include/sys/types.h /usr/include/sys/fcntl.h /usr/include/sys/feature_tests.h /usr/include/sys/types.h /usr/include/sys/stat.h /usr/include/sys/feature_tests.h /usr/include/sys/types.h /usr/include/sys/time_std_impl.h /usr/include/sys/feature_tests.h /usr/include/sys/stat_impl.h /usr/include/sys/feature_tests.h /usr/include/sys/types.h …… De ROOT a BOOT

  14. Problem with dictionaries • Today cint/reflex dictionaries are machine dependent. • They represent a very substantial fraction of the total code. • We could make a very large fraction machine independent. • Interface to functions could be reduced with standard ABIs. • Dict data structures could be saved to a root file instead of generating the code producing these ds. • In this case, one will import only the ds for the classes really used (I/O or interpreter) De ROOT a BOOT

  15. ROOT source, bins, dict,libs *.h 153 kl 6.4 Mb SLC3/gcc3.2.3 Windows/vc++7.1 rootcint –cint 56s, 71s rootcint –reflex 58s, 71s rootcint –gccxml 300s, 100s *.cxx 855 kl 100 Mb Xdict_c.cxx 704 kl Xdict_r.cxx 623 kl Xdict_g.cxx 623kl c++ 338s, 90s c++ 420s, 417s c++ 427s, 421s c++ 2640s, 1614s *.o 41 Mb, 114 Mb Xdict_c.o 44 Mb, 53 Mb Xdict_r.o 51Mb, 65 Mb Xdict_g.o 51Mb, 65 Mb ld 15s, 45s *.so, .lib 88 Mb, 71 Mb De ROOT a BOOT

  16. De ROOT a BOOT

  17. Shared libs • Shared libs are essential for today large applications. • They optimize the development time if inter-library dependencies is correctly managed. • The plug-in manager is an essential component that minimizes the number of libraries linked at the start of an application. • However, a large number of libs may be a killer, in particular for interactive applications. • Because of large compilation times, most experiments export pre-compiled shared libs. • These libs are compiled for maximum portability and do not always use efficiently local processors capabilities. De ROOT a BOOT

  18. Exported Symbols • Time to load a shared lib is grosso modo • time = size * n * log(N) • size = shared lib size in bytes (mapped I/O) • n = number of exported symbols in lib • N = number of existing exported symbols in previously loaded shared libs • A good compromise must be found between the number of libraries and their size (modularity vs performance) • GCC4 & Windows allow selection of symbols accessible from outside shared lib (“exported”) . • Currently most applications export all C++ symbols ! De ROOT a BOOT

  19. Shared lib size in bytes Fraction of ROOT code really used in a batch job De ROOT a BOOT

  20. Fraction of ROOT code really used in a job with graphics De ROOT a BOOT

  21. Can we gain with a better packaging? • Yes and no • One shared lib per class implies more administration, more dictionaries, more dependencies. • 80 shared libs for ROOT is already a lot. 500 would be non sense • A CORE library is essential. However some developers do not like this and penalize/complicate the life of the vast majority of users. • Plug-in Manager helps De ROOT a BOOT

  22. Atlas packages with > 10000 lines 211677 dice fortran=211641 187691 atrecon fortran=138126,cpp=49354 129793 MuonSpectrometer fortran=121321,python=3715,csh=2613,sh=2136 118504 Tools cpp=67337,ansic=19012,python=13770,sh=7373,yacc=5659, fortran=3024,lex=1971 116327 PhysicsAnalysis cpp=107348,python=6070,sh=1649,csh=1260 115143 geant3 fortran=115040,ansic=67 112445 TileCalorimeter cpp=108580,python=2209,csh=920,sh=736 108200 atutil fortran=108000,ansic=164 80866 Applications fortran=71764,cpp=6961,ansic=1865 74721 Calorimeter cpp=65917,python=7854,sh=490,csh=460 67822 atlfast fortran=67786 64838 Tracking cpp=60255,python=2092,csh=1380,sh=1104 59429 Generators fortran=28136,cpp=25538,python=4123,sh=872,csh=760 49926 graphics java=40719,cpp=8312,python=321,sh=255,csh=220 40058 AtlasTest cpp=25159,python=5131,sh=4815,perl=4145,csh=517 39576 Control cpp=22030,python=15904,sh=907,csh=693 31192 DetectorDescription ansic=29540,csh=680,sh=562,python=343 29500 TestBeam cpp=27433,python=1491,csh=320,sh=256 25001 Reconstruction sh=10297,fortran=7559,python=5393,csh=1667 18989 atlsim fortran=17561,cpp=1380 18328 InnerDetector python=11466,csh=2860,sh=2641,ansic=1343 17291 Simulation python=13653,sh=2126,csh=1302,fortran=169 16139 Database perl=8310,sh=4299,java=2209,csh=709,python=566 14250 Event cpp=13522,python=296,csh=240,sh=192 12930 gcalor fortran=12894 11955 Trigger python=7860,csh=1780,sh=1673,perl=634 11195 LArCalorimeter python=6133,ansic=2045,csh=1620,sh=1347 3 million lines of code 1200 packages De ROOT a BOOT

  23. Alice packages with > 10000 lines 398742 PDF fortran=398729,ansic=13 146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100 128337 HLT cpp=127601,ansic=605,sh=100,csh=31 128103 ITS cpp=128010,sh=93 105763 MUON cpp=105673,sh=90 94548 DPMJET fortran=94267,cpp=281 72400 STEER cpp=72400 52443 HBTAN cpp=51260,fortran=1183 51489 TPC cpp=51479,sh=10 50932 PHOS cpp=50639,csh=293 46176 TRD cpp=46176 41998 ISAJET fortran=40483,cpp=1494,pascal=21 39407 RALICE cpp=29764,ansic=9355,sh=288 35916 EMCAL cpp=35410,fortran=383,csh=123 31820 ANALYSIS cpp=31820 27751 HERWIG fortran=27246,cpp=477,ansic=28 27025 FMD cpp=27021,sh=4 26667 TOF cpp=26667 24258 EVGEN cpp=24258 21588 HIJING fortran=21099,cpp=489 20562 JETAN cpp=19687,fortran=875 18344 RAW cpp=18344 15232 STRUCT cpp=15232 13142 PMD cpp=13142 12945 RICH cpp=12945 10966 FASTSIM cpp=10966 10944 MONITOR cpp=10944 10659 ZDC cpp=10659 1.5 million lines of code De ROOT a BOOT

  24. Fraction of code really used in one program %functions used %classes used De ROOT a BOOT

  25. Consequences • The fact that only a very small fraction of the total code base is used has important consequences. • We must turn this apparent problem into a great feature. • BOOT: a proposal to solve this problem. De ROOT a BOOT

  26. h.Draw() local mode CINT libX11 ------- … drawline drawtext … libCore ------- … I/O TSystem … libGpad ------- … TPad TFrame … pm pm (Plug-in Manager) pm libGraf ------- … TGraph TGaxis TPave … libHist ------- … TH1 TH2 … libHistPainter ------- … THistPainter TPainter3DAlgorithms … pm pm De ROOT a BOOT

  27. Experience with C++ • Very powerful but complex language. • Easy to make a complex system with a lot of class dependencies. Changing one class forces a recompilation of many other classes. • No garbage collector. Only one heap. • ABI(Application Binary Interface) is not yet standardized: a mess on Linux/gcc (C is OK) • No introspection: -> develop yours. • Too much coupling between data and code. • Templates defined statically at compilation time, ie difficult to use in an interactive environment. • Slow compilation if abuse of templates and STL De ROOT a BOOT

  28. Missing features in C++ • Introspection • Not possible to compile a class from a dictionary • Multi-heap (like Zebra divisions) • Would require a garbage collector and a Handle type like in C++/CLI from MS • Possibility to add one or more functions without recompiling the class, although this can be easily done in C. • Dynamic creation of templated types De ROOT a BOOT

  29. Introspection systems • Meta information describing all types and functions. • Not necessary for languages like f77 having only basic types. I/O in f77 implemented via simple switch statements. • Vital for languages supporting derived types for automatic I/O, inspectors, browsers and interpreters. • CINT, Java, python, ruby, cint/root/reflex De ROOT a BOOT

  30. Why not Java or Python • Java strong candidate in 1996->2000 • Why experiments moved to C++? • Speed, Geant4, ROOT ? Microsoft view Computer scientist view Java is more productive than C/C++. Use C/C++ only when speed or bare metal access is called for. Python/Ruby is more productive than Java and more pleasant to code in. De ROOT a BOOT

  31. Language comparisons (1) See for example:http://fishbowl.pastiche.org/2002/10/21/an_empirical_comparison_of_programming_languages De ROOT a BOOT

  32. Main software problems seen by large experiments • Move to C++ completed (well nearly!) • Complex experiment framework • Too many dependencies • Difficult to install (SCRAM, CMT) • Installation time far too long • The wheel is reinvented many times • Several unwanted features (eg Atlas Storegate) • Coding conventions not followed • A code checker is essential • Non documented classes and modules De ROOT a BOOT

  33. Dictionaries : situation in 2006 Python CINT Root meta C++ Reflex/Cint DS ROOT CINT/Reflex API rootcint -cint XDictcint.cxx rootcint -reflex X.h rootcint -gccxml De ROOT a BOOT

  34. Interpreter & Compiler integration execute file script.C root > .x script.C root > DoSomething(…); root > .x script.C++ root > .x script.C+ execute function DoSomething compile file script.C and execute it compile file script.C if file has been modified. execute it same from compiled or interpreted code gROOT->ProcessLine(“.L script.C+”); gROOT->ProcessLine(“DoSomething(…)”); De ROOT a BOOT

  35. Possible Progress with Interpreters • Eliminate the stub interface to call C/C++ functions. • This is already possible in CINT with C libraries. • It will be possible with C++ when a standard ABI will be available, otherwise compiler&linker dependent. • If compiler is fast enough (eg C), use the interpreter only for organizing the top level. • If next C++ provides introspection, one could eliminate • the header files parser • 95 per cent of the dictionary structure in memory • A good argument to have the interpreted and compiled code being in the same language! • But WHEN ??????? De ROOT a BOOT

  36. Proposal for a new scenario Introducing BOOT A Software Bootstrap system De ROOT a BOOT

  37. R O O T BOOT What is BOOT? • A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment). • It is a very small subset of ROOT (5 to 10 per cent) • The same idea could be extended to other domains, like simulation and reconstruction. De ROOT a BOOT

  38. What is BOOT (2)? • A small, easy to install, standalone executable module ( < 5 Mbytes) • One click in the web browser • It must be a stable system that can cope with old and new versions of other packages including ROOT itself. • It will include: • A subset of ROOT I/O, network and Core classes • A subset of Reflex • A subset of CINT (could also have a python flavor) • Possibly a GUI object browser • From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way. De ROOT a BOOT

  39. What is BOOT (3)? • No binary files or shared libs • Always start from the source URL • Compile into local cache and reuse at next session. • A tool is provided to convert a CVS source tree into a compact file that also includes the dictionary data structures and the classes/functions documentation. • Compile with the best options for the local hardware. De ROOT a BOOT

  40. BOOT and existing applications • BOOT must be able to run with the existing codes, may be with reduced possibilities. • In the next slides, a few use cases to illustrate the ideas. • Do not take the syntax as a final word. De ROOT a BOOT

  41. R O O T BOOT BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx.yyy.zzz • Nothing else on the machine , except the compiler (no ROOT, etc) • Import a ROOT file containing histograms, Trees and other classes (usecase1.root) • Browse contents of file • Draw an histogram De ROOT a BOOT

  42. Use Case 1 http://root.cern.ch/source.root This is a compressed ROOT file containing the full ROOT source tree automatically built from CVS (25 Mbytes) + ROOT classes dictionary DS generated by Reflex (5 Mbytes) + The full classes documentation Objects generated by the source parser (5 Mbytes) Usecase1.root (2 Mbytes) Contains references (URL) to classes in namespace ROOT Local cache with the source of the classes really used + binaries for the classes or functions that are automatically generated from the interpreter (like ACLIC mechanism) user@xxx.yyy.zzz pcroot@cern.ch De ROOT a BOOT

  43. Use Case 1 pictures http://root.cern.ch/source.root usecase1.root De ROOT a BOOT

  44. Use Case 2 • BOOT already installed • Want to write the shortest possible program using some classes in namespace ROOT and some classes from another namespace YYYY //This code can be interpreted line by line //executed as a script or compiled with C/C++ //after corresponding code generation use ROOT=http://root.cern.ch/root5.10/source.root use YYYY=http://cms.cern.ch/packages/yyyy h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); h.Draw(); De ROOT a BOOT

  45. Use Case 3 • A variant of Use Case 2 • A bug has been found in class LorentzVector of ROOT and fixed in new version ROOT6 use ROOT, YYYY=http://cms.cern.ch/packages/yyyy use ROOT6=http://root.cern.ch/root6/code.root use ROOT6::LorentzVector h = new TH1F(“h”,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); De ROOT a BOOT

  46. Use Case 4: Specialized Code Generators use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root TFile f(“mcrun.root”); for each entry in f.T for each electron in Electrons if(electron.m_Eta > 1) h.Fill(electron.m_Pt); h.Draw • High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files. • PROOF compliant • Extension of TTree::MakeProxy code generator. • Do not read referenced but unused branches. De ROOT a BOOT

  47. Use Case 5: Dynamic HELP, Dynamic html • Source files and scripts are browsable in html format generated dynamically. • Combination of new version of THtml and the new GUI widget TGHtml. • Both classes use extensively the Reflex dictionary and the pre-digested documentation. De ROOT a BOOT

  48. Use Case 6: Event Displays • In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand, Crocodile). • This is complex and not good for users and OUTREACH. • A data file with the visualization scripts is far more powerful • This implies that the GUI must be fully scriptable. This is the case for ROOT GUI. Event data in a Tree C++ scripts De ROOT a BOOT

  49. BOOT: Réalité ou Rêve ? Ou en sommes nous?Quels sont les développements nécessaires?

  50. Problème 1: accès efficace a travers le web • Accès a des fichiers sources sur le web a travers des réseaux avec grande latence (> 30ms) • Diminuer le nombre de messages entre client et serveur • Accroître la taille des messages • La résolution de ce problème est en partie achevée et nous a conduit a des améliorations fondamentales pour l’efficacité des entrées/sorties dans ROOT en général. De ROOT a BOOT

More Related