430 likes | 445 Views
Explore the evolution and impact of the ROOT project in the multi-core CPU era. Discover the growth, challenges, and successes of ROOT over 11 years, facing sociological obstacles and achieving convergence. Learn how the project's size and complexity compare to LHC experiments and the support it receives at CERN and FNAL. Dive into the multi-core CPU impact on ROOT development and the importance of a convergent process. Unveil the role of multi-core CPUs in enhancing ROOT performance and scalability, shaping its future across various areas.
E N D
The ROOT Project in the multi-core CPU era CHEP06, Mumbai 15 February 2006 René Brun CERN
Plan of talk • ROOT: 11 years old !! • Still many developments • Multi Core cpus: parallelism • ROOT, Software Obesity and the GRID ROOT in the multi-core cpu era
ROOT: a long story • Started in January 1995. ROOT had to face many sociological obstacles at a time when most users were changing experiments, languages and lost in many fights. “Every problem has its root in failure of a relationship” (The Times of India Tuesday 14 February) • This initial opposition has been a key element for the success of the project. By spotting the inevitable weaknesses of some early designs, it forced the team to react quickly. The development method involving more and more users has been essential to get feedback. Designing a large system like ROOT is an iterative process. This process has involved many people in many experiments. • ROOT is now strongly supported at CERN and FNAL. Many thanks to the management and my colleagues in the LCG project for facilitating a convergent process. ROOT in the multi-core cpu era
ROOT project: some numbers • The ROOT project is comparable in size and complexity to the software of each LHC experiment. See, for instance, the evaluation by the sloccount tool • sloccount by John Wheeler assumes Total Physical Source Lines of Code (SLOC) = 1,709,170 Development Effort Estimate, Person-Years (Months) = 495.97 (5,951.63) Schedule Estimate, Years (Months) = 5.66 (67.97) Estimated Average Number of Developers = 87.57 Total Estimated Cost to Develop = $ 66,998,665 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) (average salary = $56,286/year, overhead = 2.40). ROOT in the multi-core cpu era
ROOT person power CERN + FNAL Only people working full time on the project ROOT in the multi-core cpu era
98 - PROOF - The Parallel ROOT Facilit Distributed Data Analysis - Monday 13 February 15:00 Presenter: GANIS, Gerardo (CERN) 187 - ROOT GUI, General Status Software Tools and Information Systems - Monday 13 February 16:40 Presenter: RADEMAKERS, Fons (CERN) 188 - From Task Analysis to the Application Design Software Tools and Information Systems - Monday 13 February 17:00 Presenter: Mr. RADEMAKERS, Fons (CERN) 129 - ROOT I/O for SQL databases Software Components and Libraries - Monday 13 February 17:40 Presenter: Dr. LINEV, Sergey (GSI DARMSTADT) 185 - Reflex, reflection for C++ Software Components and Libraries - Tuesday 14 February 14:00 Presenter: Dr. ROISER, Stefan (CERN) Xxx Recent Developments in the ROOT I/O and TTrees Software Components and Libraries - Monday 13 February 16:00 Presenter: Dr. Canal, Philippe (FNAL) 227 - New Developments of ROOT Mathematical Software Libraries Software Components and Libraries - Tuesday 14 February 16:00 Presenter: Dr. MONETA, Lorenzo (CERN) 383 - New features in ROOT geometry modeller for representing non-ideal geometries Software Components and Libraries - Wednesday 15 February 14:00 Presenter: CARMINATI Federico (CERN) 93 – ROOT 3D graphics Software Components and Libraries - Wednesday 15 February 16:00 Presenter: BRUN, Rene (CERN) 407 - Performance and Scalbility of xrootd Distributed Data Analysis - Wednesday 15 February 17:00 Presenter: HANUSHEVSKY, Andrew (Stanford Linear Accelerator Center) Presentations about ROOT & co at CHEP06 • 92 - ROOT 2D graphics visualisation techniques • Poster - Monday 13 February 11:00 • 91 - ROOT 3D graphics overview and examples • Poster - Monday 13 February 11:00 • 189 - Recent User Interface Developments • Poster - Monday 13 February 11:00 • 186 - ROOT/CINT/Reflex integration • Poster - Monday 13 February 11:00 • 228 - The structure of the new ROOT Mathematical Software Libraries • Poster - Wednesday 15 February 09:00 • 249 - XrdSec - A high-level C++ interface for security services in client-server applications • Poster - Wednesday 15 February 09:00 • 408 - xrootd Server Clustering • Poster - Wednesday 15 February 09:00 ROOT in the multi-core cpu era
Multi Core cpus Impact on ROOT
Multi Core CPUs http://www.intel.com/technology/computing/archinnov/platform2015/ This is going to affect the evolution of ROOT in many areas ROOT in the multi-core cpu era
Moore’s law revisited Your laptop in 2016 with 32 processors 16 Gbytes RAM 16 Tbytes disk > 50 today’s laptop ROOT in the multi-core cpu era
Impact on ROOT • There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap. • Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware. • PROOF obvious candidate. By default a ROOT interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user. • Speed-up I/O with multi-threaded I/O and read-ahead • Buffer compression in parallel • Minimization function in parallel • Interactive compilation with ACLIC in parallel • etc.. ROOT in the multi-core cpu era
latency 100 nanos 100 micros 100 millis Interactive jobs run on the laptop and use processors on the GRID Real Time important for short/medium queries CPU/Node hierarchy Local cluster 1000xN cpus Laptop node 1->32->??N cpus GRID(s) 100x1000 nodes Batch jobs pushed to the GRID Maximum number of jobs run in one week/month Analysis mainly on laptop and ONE cluster on the GRID ROOT in the multi-core cpu era
Software Obesity Use local power as much as possible. Can we simplify software installation on the GRID? A proposal
Observations • A considerable amount of time is spent in installing software (up to one day for an expert). • Porting to a new platform is non trivial. • Dependency problems in case many packages must be installed. • Only a small subset of the software is used. • The installation may require a huge amount of disk space. Users are scared to download a new version. • This is not fitting well with the GRID concept. • The GRID should be used to simplify this process and not to make it more complex. ROOT in the multi-core cpu era
LHC software ROOT in the multi-core cpu era
Source of inefficiencies with Shared Libs • fPIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%) • In case of many shared libs, the percentage of classes and code used is small =>swapping (20%) • Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. The same optimization level is used everywhere • But a very large fraction of the code does not need to be optimized: no gain at execution, big loss when compiling • A small fraction of the code should be compiled with the highest possible optimization (10%) • May be a factor 2 loss !!! ROOT in the multi-core cpu era
Shared Libs vs Archive Libs • In the Fortran era, often one subroutine/file • Loader takes only the subroutines really referenced. However the percentage of referenced but not used code has increased with time. • Shared libs were efficient at a time when code could be shared between different tasks on time sharing systems. • Shared libs have solved partially the link time problem. • Shared libs are not a solution for the long term. • Archive libs are unusable in a large system, but nice to build static modules • What to do ? ROOT in the multi-core cpu era
Shared lib size in bytes Fraction of ROOT code really used in a batch job ROOT in the multi-core cpu era
Fraction of ROOT code really used in a job with graphics ROOT in the multi-core cpu era
Fraction of code really used in one program %functions used %classes used ROOT in the multi-core cpu era
memory We are waisting a lot of time in writing/reading .o or .so files to/from disk Cint 10000 l/s c++ 800 l/s ld myapp *.cxx, *.h 100 Mb *.o 110 Mb *.so 76 Mb ROOT in the multi-core cpu era
Proposal for a new scenario Introducing BOOT A Software Bootstrap system ROOT in the multi-core cpu era
R O O T BOOT What is BOOT? • A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment). • It is a very small subset of ROOT (5 to 10 per cent) • The same idea could be extended to other domains, like simulation and reconstruction. ROOT in the multi-core cpu era
What is BOOT? • A small, easy to install, standalone executable module ( < 5 Mbytes) • One click in the web browser • It must be a stable system that can cope with old and new versions of other packages including ROOT itself. • It will include: • A subset of ROOT I/O, network and Core classes • A subset of Reflex • A subset of CINT (could also have a python flavor) • Possibly a GUI object browser • From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way. ROOT in the multi-core cpu era
BOOT and existing applications • BOOT must be able to run with the existing codes, may be with reduced possibilities. • In the next slides, a few use cases to illustrate the ideas. • Do not take the syntax as a final word. ROOT in the multi-core cpu era
R O O T BOOT BOOT: Use Case 1 • Assumes BOOT already installed on your machine user@xxx.yyy.zzz • Nothing else on the machine , except the compiler (no ROOT, etc) • Import a ROOT file containing histograms, Trees and other classes (usecase1.root) • Browse contents of file • Draw an histogram ROOT in the multi-core cpu era
Use Case 1 http://root.cern.ch/coderoot.root This is a compressed ROOT file containing the full ROOT source tree automatically built from CVS (25 Mbytes) + ROOT classes dictionary DS generated by Reflex (5 Mbytes) + The full classes documentation Objects generated by the source parser (5 Mbytes) Usecase1.root (2 Mbytes) Contains references (URL) to classes in namespace ROOT Local cache with the source of the classes really used + binaries for the classes or functions that are automatically generated from the interpreter (like ACLIC mechanism) user@xxx.yyy.zzz pcroot@cern.ch ROOT in the multi-core cpu era
Use Case 1 pictures usecase1.root code.root ROOT in the multi-core cpu era
Use Case 2 • BOOT already installed • Want to write the shortest possible program using some classes in namespace ROOT and some classes from another namespace YYYY //This code can be interpreted line by line //executed as a script or compiled with C/C++ //after corresponding code generation use ROOT, YYYY=http://cms.cern.ch/packages/yyyy h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); h.Draw(); ROOT in the multi-core cpu era
Use Case 3 • A variant of Use Case 2 • A bug has been found in class LorentzVector of ROOT and fixed in new version ROOT6 use ROOT, YYYY=http://cms.cern.ch/packages/yyyy use ROOT6=http://root.cern.ch/root6/code.root use ROOT6::LorentzVector h = new TH1F(“h’,”example”,100,0,1); v = new LorentzVector(….); gener = new myClass(v.x()); h.Fill(gener.Something()); ROOT in the multi-core cpu era
Use Case 4 • High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files. use ROOT use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root TFile f(“mcrun.root”); for each entry in f.Tree for each electron in Electrons h.Fill(electron.m_Pt); h.Draw ROOT in the multi-core cpu era
Use Case 5: Event Displays • In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand, Crocodile). • This is complex and not good for users and OUTREACH. • A data file with the visualization scripts is far more powerful • This implies that the GUI must be fully scriptable. This is the case for ROOT GUI. Event data in a Tree C++ scripts ROOT in the multi-core cpu era
Requirements: work to do • libCore has already all the infrastructure for client-server communications and for accessing remote files on the GRID. • We must understand how to use subsets of the compilers and linkers to bypass disk I/O. • We must understand how to emulate a dynamic linker using pre-compiled objects in memory. • We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python). • We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary. ROOT in the multi-core cpu era
Procedure • These are just ideas. Making a firm proposal requires more investigations and prototyping. • It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward. • This work will continue as a background activity. ROOT in the multi-core cpu era
Conclusions • After more than 10 years of intensive development, the CORE work packages are consolidated. • Important developments in PROOF, Math, CINT, Reflex, 3-D graphics. • All packages must be adapted to a multi-threading environment made necessary by the multi core cpus. • .Instead of pushing gigabytes of source or shared libs to the GRID working nodes, BOOT could greatly optimize and simplify the use of the GRID. BOOT will use a PULL technique to download only the software necessary (source) to run an application and in an incremental way. • Hoping to show a working BOOT at the next CHEP. ROOT in the multi-core cpu era
“Classic” approach catalog files query jobs data file splitting myAna.C merging final analysis outputs submit G. Ganis, CHEP06, 15 Feb 2006 Storage Batch farm queues manager • “static” use of resources • jobs frozen, 1 job / worker node • “manual” splitting, merging • limited monitoring (end of single job) ROOT in the multi-core cpu era
The PROOF approach files scheduler query PROOF query: data file list, myAna.C feedbacks (merged) final outputs (merged) G. Ganis, CHEP06, 15 Feb 2006 catalog Storage PROOF farm MASTER • farm perceived as extension of local PC • more dynamic use of resources • real time feedback • automated splitting and merging ROOT in the multi-core cpu era
Atlas packages with > 10000 lines 211677 dice fortran=211641 187691 atrecon fortran=138126,cpp=49354 129793 MuonSpectrometer fortran=121321,python=3715,csh=2613,sh=2136 118504 Tools cpp=67337,ansic=19012,python=13770,sh=7373,yacc=5659, fortran=3024,lex=1971 116327 PhysicsAnalysis cpp=107348,python=6070,sh=1649,csh=1260 115143 geant3 fortran=115040,ansic=67 112445 TileCalorimeter cpp=108580,python=2209,csh=920,sh=736 108200 atutil fortran=108000,ansic=164 80866 Applications fortran=71764,cpp=6961,ansic=1865 74721 Calorimeter cpp=65917,python=7854,sh=490,csh=460 67822 atlfast fortran=67786 64838 Tracking cpp=60255,python=2092,csh=1380,sh=1104 59429 Generators fortran=28136,cpp=25538,python=4123,sh=872,csh=760 49926 graphics java=40719,cpp=8312,python=321,sh=255,csh=220 40058 AtlasTest cpp=25159,python=5131,sh=4815,perl=4145,csh=517 39576 Control cpp=22030,python=15904,sh=907,csh=693 31192 DetectorDescription ansic=29540,csh=680,sh=562,python=343 29500 TestBeam cpp=27433,python=1491,csh=320,sh=256 25001 Reconstruction sh=10297,fortran=7559,python=5393,csh=1667 18989 atlsim fortran=17561,cpp=1380 18328 InnerDetector python=11466,csh=2860,sh=2641,ansic=1343 17291 Simulation python=13653,sh=2126,csh=1302,fortran=169 16139 Database perl=8310,sh=4299,java=2209,csh=709,python=566 14250 Event cpp=13522,python=296,csh=240,sh=192 12930 gcalor fortran=12894 11955 Trigger python=7860,csh=1780,sh=1673,perl=634 11195 LArCalorimeter python=6133,ansic=2045,csh=1620,sh=1347 3 million lines of code 1200 packages ROOT in the multi-core cpu era
Alice packages with > 10000 lines 398742 PDF fortran=398729,ansic=13 146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100 128337 HLT cpp=127601,ansic=605,sh=100,csh=31 128103 ITS cpp=128010,sh=93 105763 MUON cpp=105673,sh=90 94548 DPMJET fortran=94267,cpp=281 72400 STEER cpp=72400 52443 HBTAN cpp=51260,fortran=1183 51489 TPC cpp=51479,sh=10 50932 PHOS cpp=50639,csh=293 46176 TRD cpp=46176 41998 ISAJET fortran=40483,cpp=1494,pascal=21 39407 RALICE cpp=29764,ansic=9355,sh=288 35916 EMCAL cpp=35410,fortran=383,csh=123 31820 ANALYSIS cpp=31820 27751 HERWIG fortran=27246,cpp=477,ansic=28 27025 FMD cpp=27021,sh=4 26667 TOF cpp=26667 24258 EVGEN cpp=24258 21588 HIJING fortran=21099,cpp=489 20562 JETAN cpp=19687,fortran=875 18344 RAW cpp=18344 15232 STRUCT cpp=15232 13142 PMD cpp=13142 12945 RICH cpp=12945 10966 FASTSIM cpp=10966 10944 MONITOR cpp=10944 10659 ZDC cpp=10659 1.5 million lines of code ROOT in the multi-core cpu era
h.Draw() local mode CINT libX11 ------- … drawline drawtext … libCore ------- … I/O TSystem … libGpad ------- … TPad TFrame … pm pm (Plug-in Manager) pm libGraf ------- … TGraph TGaxis TPave … libHist ------- … TH1 TH2 … libHistPainter ------- … THistPainter TPainter3DAlgorithms … pm pm ROOT in the multi-core cpu era
Problem with STL Inlining • STL containers are very nice. However they have a very high cost in a real large environment. • Compiling code with STL is much much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc4), but not much. • Object modules are bigger • Compiler or linker is able to eliminate duplicate code in ONE object file or shared lib, not across libraries. • If you have 100 shared libs, it is likely that you have the code for std:vector push_back or iterators 100 times! • In-lining is nice if used with care (or toy benchmarks). It may have an opposite effect, generating more cache misses in a real application. • Templates are statically defined and difficult to use in an dynamic interactive environment. ROOT in the multi-core cpu era
Can we gain with a better packaging? • Yes and no • One shared lib per class implies more administration, more dictionaries, more dependencies. • 80 shared libs for ROOT is already a lot. 500 would be non sense • A CORE library is essential. However some developers do not like this and penalize/complicate the life of the vast majority of users. • Plug-in Manager helps ROOT in the multi-core cpu era