760 likes | 776 Views
This proposal suggests improvements for the relationship between ROOT and LCG/AA, recommending better integration and decreased duplication of efforts.
E N D
Proposal for Improvements Rene Brun 14 January 2004 LCG/AA/ROOT Relationship Some slides of this talk were presented at the Architects Forum 30 October 2003
Applications Area Organisation decisions strategy Architects forum Applications manager ROOT User - provider Applications area meeting POOL project SEAL project PI project Simulation project SPI project consultation • The ‘user/provider’ relationship is working • Good ROOT/POOL cooperation . POOL gets needed modifications, ROOT gets debugging/development input • ROOT will be the principal analysis tool; full access to its capability is a blueprint requirement • ALICE directly using ROOT as their framework Torre
User/Provider relationship • It works in the sense that teams did not show unwillingness to cooperate. • The cooperation is ambiguous. The wheel is reinvented many times. • The duplication of efforts will give problems in the near future (dictionaries, plug-in managers, collections and many more (coming slides)) • Manpower would be better used in improving critical areas. • Alice has not joined the train.
User/Provider relationship • The current orientation is OK if the idea is to use ROOT as a back-end in a few places and alternative solutions are seriously considered with clear deliverables. • If ROOT is the choice for the two essential areas: event storage and interactive data analysis, this has important implications. • In this case the user/provider relationship is not appropriate: • ROOT must be better integrated in the LCG. This has implications for the LCG/AA plans and also for the ROOT planning.
Motivation for this presentation • We have two options in front of us: • Continue the current process assuming that everything is OK in the best of the worlds. ROOT is happy, LCG/AA is happy. • Take advantage of the useful internal review to rethink the general orientation. • We have a unique opportunity now, with enough experience with all the projects, to take the necessary actions to decrease the entropy in the interest of the LHC and also non-LHC users. • We must capitalize on one year of useful experience in AA to setup a convergent and coherent process.
MAIN Motivation Make it simpler for our users Current system is too complex Far too many layers
Plan of talk • In the following slides, I review the main projects: POOL, SEAL, SIMU and PI,ARDA with a proposal for a better integration with ROOT. • I start with a few slides indicating where we are with ROOT. Our current developments are relevant to the LCG work. SEAL: single dictionary,plug-in manager, mathlibs POOL: collections, performance, goals SIMU: VMC, geometry and geometries interfaces PI: what next? ARDA: Distributed Analysis and ROOT/PROOF SPI: using/moving to the infrastructure
ROOT status Version 3.05/07 released in July 2003 Version 3.10/02 released in December Working on major new version 4.0
ROOT version 4 Highlights (1) • Support for Automatic Schema evolution for foreign classes without using a class version number. • Support for large files (> 2 GBytes) • New data type Double32_t (double in memory, saved as float on the file) • Native support for STL: automatic Streamers, no code generation anymore. • Tree split mode with STL vector/list • Plug-in Manager upgrade (rlimap) with automatic library/class discovery/load. •
ROOT version 4 Highlights (2) • PROOF/Alien in production • Xrootd (collaboration with Babar) • New Linear Algebra package • Geometry interface to OpenGL/Coin3D • Support for Qt (alternative option to x11). • GUI builder with GUI code generation • New GUI Histogram editor • Interface with Ruby First development release just before the ROOT workshop (25 February SLAC) Final PRO release scheduled for June.
ROOT and SPI • If the model evolves from a “user-provider” relationship to a real and effective integration of ROOT in the LCG plans, it will become obvious that ROOT should use the same infrastructure (SPI). • The current work from Torre is an essential ingredient to simplify the development and build procedures, a prerequisite for convergence. • It is too early to take a practical decision as it depends on the acceptation of this plan and on real achievements.
SEAL: Duplications • Due to well known historical reasons, SEAL is duplicating systems already provided by ROOT,eg: • Object dictionary • Plug-in manager • Regular Expressions • Compression algorithms • In the following, I will discuss only the dictionary and the plug-in manager.
Seal libraries size and dependencies SealBase 6.60MB Reflection 2.40MB ReflectionBuilder 1.02MB SealUtil 0.85MB PluginManager 1.28MB IOTools 1.29MB SealSTLdict 5.13MB CLHEP 1.50MB SealKernel 1.62MB SealZIP 2.15MB SealCLHEPdict 4.09MB SealServices 1.58MB GMinuit 2.45MB
.xml .h GCC-XML Code Generator ROOTCINT LCG dictionary code CINT dictionary code Gateway I/O CINT dictionary LCGdictionary Other Clients Data I/O Reflection Technology dependent SEAL Dictionary: Reminder DictionaryGeneration Hum ! All boxes are technology dependent!
SEAL: The dictionary saga • There were 4 reasons to develop an alternative dictionary: • Make it independent of ROOT/CINT. • Make it available with other languages. • Remove parsing limitations of rootcint. • Necessary for POOL alternative backend. • The alternative language is a false problem. All collaborations are heavily investing in C++, and anyhow the SEAL dictionary is not appropriate for languages coming with introspection/reflection capabilities. • The other 3 reasons must be seen with a different angle, if ROOT is the choice for storage manager and analysis engine. • Everybody agrees that having 2 dictionaries is a nightmare, a source of more and more conflicts and new problems.
LCG Dictionary size Atlas (Nov version) • In November, we investigated the size of the LCG dictionary in case of Atlas, CMS and ROOT itself. LHCb were not in a position to estimate the size because they did not have the code generator yet. • As a positive effect of this exercise, the SEAL team has been able meanwhile to gain a factor 3 in the size of the dictionary on disk, but no estimation of the gain (if any) in memory. ATLAS (27 classes) Library Classes.o LCGdict.o LCGdict/class CINTdict.o ------------------------------------------------------------------- SimpleTrack 10.7k 144k 13.45 EventHeader 12.7k 89k 7.00 FourMom 49k 13k 0.26 GenerateObject(HepMC) 388k 326k 0.84 LArSimEvent 26k 88k 3.38 EventInfo 33k 120k 3.63 65k 4.7 +- 4.4
LCG Dictionary size CMS (Nov version) • Bill compared the sizes of the same CMS dictionary object files (*.o) (COBRA+ORCA) on disk produced by lcgdict versus that for rootcint produced dictionaries. • Total number of dictionaries = 30 • Total number of classes = 359 • Average data members per class = 435/359 = 1.2 • Average functions per class 1868/359 = 5.2 • All were compiled with gcc_3.2.3 with the -O2 option, and all the symbols were stripped (with strip) for the purpose of this comparison. • The size ratios are quite consistent across dictionaries, so we give the total sizes. • ROOT: 3.45 Mb • POOL: 5.37 Mb • So the lcg dictionary files are approx. 50% larger. • Note that the CMS classes above are only the base classes of the framework. It would have been interesting to have more statistics based on concrete application classes with more data members and functions.
LCG Dictionary size ROOT (Nov version) • It was easy to generate the dictionaries for about one half of all ROOT classes (320/600) • In order to evaluate the impact in memory of the LCG dictionary, I linked the dictionaries with the ROOT executable module. • Full ROOT Process Memory Size = 12.30 Mbytes • Same + lcg dictionary = 28.30 Mbytes • Remark1: The lcg dictionary for 1/2 of the ROOT classes is 1.3 times bigger than ROOT itself. • Remark2: The LCG dictionary does not contain all the information available in the CINT dictionary.
ROOT Dictionary size If all classes have a dictionary, the size of the dictionary may become a large fraction of the executable module!
The CINT dictionary The CINT library is small: 1.5 MByte CINT is more than just a parser and API to the dictionary C++ parser(s) rootcint Data structures GClassInfo API Data members, functions ByteCode Generator Interpreter Byte Code
The CINT dictionary evolution • Data Members • Supports already all C++ features (no missing important features like typedef or enum) • Future is to look into XTI in case there is progress with the C++ committee • Parser/Code generator • The number of failing cases has considerably dropped. We consider parsing failures with high priority. They are in general fixed in the “next week” CINT release.
Dictionary: How to make progress • Review asap functionality provided by LCGdict and CINT • Collect info from CMS/Atlas,others on the size of dictionaries. • Investigate how many classes (*.h) can be parsed by gccxml and not by rootcint. • Compare the two APIs and data structures. • Investigate feasibility of supporting two parsers with one single dictionary in memory. • Investigate portability of gccxml on all ROOT supported platforms.
Dictionary: which options? • Start from LCG dict • Requires lcgdict to be available on all platforms where CINT runs today • Requires deep changes in the byte code and in the interpreter. • Start from CINT dictionary • Improving the API • Keeping/Improving rootcint • Adapting gccxml as an alternative parser • Both options • Following discussions in Nov/Dec, a proposal for a common C++ API to the CINT dictionary is in preparation. Because the user must see only C++ objects, this requires also a mini C++ data structure (must be small compared to CINT)
Dictionaries : root only Root meta C++ CINT DS ROOT X.h CINT API rootcint CINT XDictcint.cxx
Dictionaries : situation today lcgdict XDictlcg.cxx LCG API X.xml LCGDICT DS POOL gccxml Root meta C++ ROOT CINT DS X.h CINT API rootcint CINT XDictcint.cxx
Dictionaries : step 1 gain space lcgdict XDictlcg.cxx LCG2 API X.xml LCGDICT DS C++ POOL gccxml Root meta C++ CINT DS ROOT X.h CINT API rootcint CINT XDictcint.cxx
Dictionaries : step 2 simplification LCG ROOT API meta DS C++ POOL CINT DS ROOT X.h CINT API rootcint CINT XDictcint.cxx
Dictionaries : step 3 coherency gccxml XDict.cxx LCG ROOT API rootcint meta DS C++ POOL CINT DS ROOT X.h CINT API CINT
Plug-in Manager(s) • A Plug-in manager is an essential tool helping in making a system more modular • It simplifies dynamic linking and unlinking. • It would be nice to converge on one single manager to minimize side-effects. • The ROOT plug-in manager is very powerful and simple to use (see slide). • It does not require an object factory machinery. The interpreter is already doing it for free. • It is being extended to automate/simplify several operations, such as automatic discovery of the shared lib containing a class.
Definition of plug-ins in ROOT name class Shared lib How to call Plugin.TFile: ^rfio: TRFIOFile RFIO "TRFIOFile(const char*,Option_t*,const char*,Int_t)" +Plugin.TFile: ^castor: TCastorFile RFIO "TCastorFile(const char*,Option_t*,const char*,Int_t,Int_t)" +Plugin.TFile: ^dcache: TDCacheFile DCache "TDCacheFile(const char*,Option_t*,const char*,Int_t)" +Plugin.TFile: ^chirp: TChirpFile Chirp "TChirpFile(const char*,Option_t*,const char*,Int_t)" Plugin.TSystem: ^rfio: TRFIOSystem RFIO "TRFIOSystem()" Plugin.TSQLServer: ^mysql: TMySQLServer MySQL "TMySQLServer(const char*,const char*,const char*)" +Plugin.TSQLServer: ^pgsql: TPgSQLServer PgSQL "TPgSQLServer(const char*,const char*,const char*)" +Plugin.TSQLServer: ^sapdb: TSapDBServer SapDB "TSapDBServer(const char*,const char*,const char*)" +Plugin.TSQLServer: ^oracle: TOracleServer Oracle "TOracleServer(const char*,const char*,const char*)" Plugin.TGrid: ^alien TAlien RAliEn "TAlien(const char*,const char*,const char*,const char*)" Plugin.TVirtualPad: * TPad Gpad "TPad()" Plugin.TVirtualHistPainter: * THistPainter HistPainter "THistPainter()" Plugin.TVirtualTreePlayer: * TTreePlayer TreePlayer "TTreePlayer()" Plugin.TVirtualTreeViewer: * TTreeViewer TreeViewer "TTreeViewer(const TTree*)" Plugin.TVirtualGeoPainter: * TGeoPainter GeomPainter "TGeoPainter()" Plugin.TVirtualUtil3D: * TUtil3D Graf3d "TUtil3D()" Plugin.TVirtualUtilHist: * TUtilHist Hist "TUtilHist()" Plugin.TVirtualUtilPad: * TUtilPad Gpad "TUtilPad()" Plugin.TVirtualFitter: Minuit TFitter Minuit "TFitter(Int_t)" +Plugin.TVirtualFitter: Fumili TFumili Fumili "TFumili(Int_t)" Plugin.TVirtualPS: ps TPostScript Postscript "TPostScript()" +Plugin.TVirtualPS: svg TSVG Postscript "TSVG()" Plugin.TViewerX3D: x11 TViewerX3D X3d "TViewerX3D(TVirtualPad*,Option_t*)” +Plugin.TViewerX3D: qt TQtViewerX3D QtX3d "TQtViewerX3D(TVirtualPad*,Option_t*)”
MathLibs • It is important for HEP to have one well identified Math library (source, libs), with • Full control of the source • That we can port on as many platforms as possible • A good test suite and documentation • This does not mean that we have to develop new algorithms/classes/functions. • In Nov/Dec we had a few meetings to discuss a proposal for a Mathlib in C++, an alternative to a proposal by SEAL.
Mathlibs (2) Kernlib Mathlib Convert only on demand what is not already converted by TCL New Mathlib Open Source Not HEP/LCG restricted ROOT Linear algebra is being extended and improved ROOT TMath TMatrix TCL From GSL, Import functions not found elsewhere.Wrap C functions in classes like in TMath Give to GSL our mods as C/GSL functions CLHEP GSL subset Take small subset and freeze
Mathlibs proposals • A: SEAL proposal: Install GSL, collaborate with the GSL team. • B: Rene/Eddy proposal: copies available
Why a Mathlib in C++ • We want to interact with real objects (data and algorithms), not just algorithms. • We want to provide higher level interfaces hiding the implementation details (algorithms). A true Object-Oriented API should remain stable if internal storage or algorithms change. One can imagine the Mathlib classes being improved over time, or adapted to standard algorithms that could come with the new C++ versions. • Many classes require a good graphics interface. A large subset of CERNLIB or GSL has to do with functions. Visualizing a function requires to know some of its properties, eg singularities or asymptotic behaviors. This does not mean that the function classes must have built-in graphics. But they must be able to call graphics service classes able to exploit the algorithms in the functions library. • Many objects need operators (matrices, vectors, physics vectors, etc). • We want to embed these objects in a data model. Users start to request that the math library takes care of memory management and/or persistency of the object . See for instance the LHC-feedback [5], where persistency of the CLHEP was requested. The user would like to save and restore random-generator seeds etc . • We want to have an interactive interface from our interpreters, hence a dictionary.
C/Fortran/GSL versus C++ Object-Oriented API vs Procedural API gsl style : double gsl_sf_gamma(double x) int gsl_sf_gamma_e(double x, gsl_sf_result* result) root style : TF1 gamma(TMath::Gamma,0,1) gamma.Eval(x) gamma.Derivative(x) gamma.Integral(from,to) gamma.GetRandom() gamma.Draw()
Mathlib Proposal picture libGSL++.so Contains full standard GSL + CINT dictionary Callable from interpreter(s) TMath or/and TMath like C++ static functions Contains the most used math functions ROOT libraries High Level C++ classes Functions (a la TF1), Physics Vectors Linear Algebra, Random Numbers, Minimisation Persistency
Summary of proposal B • Install standard gsl: libGSL.so • Provide a CINT front-end (say libGSL++.so) • Nearly done, thanks Lorenzo • Extend TMath with more static functions from CERNLIB, GSL,.. • New Linear Algebra from Eddy (see later) • Extend functions classes TF1 and like with more algorithms. • 2/3 of the estimated total work already done. • Main work is the development of a test/benchmark suite
CLHEP linear algebra problems • CLHEP inversion : • sizes <= 6 : Limited precision Cramer algorithm • sizes > 6 : unscaled LU factorization (Cernlib DFACT) • Suppose Hilbert matrix A(i,j) = 1/(i+j+1)i,j=0,..,4 and calculate E = A * A^-1 • Cramer : i!=j E(i,j) < 10e-7 while • scaled LU : i!=j E(i,j) < 10e-13 • Of course inaccuracy worse for larger matrix. Scaling the matrix with a large or small number will make Cramer under/over flow. Unscaled LU factorization can under/over flow • example Hilbert matrix size > 12, routine will return error • CLHEP not thread-safe
Features found only in ROOT4.0 • In-place matrix multiplication • passing of lazy matrix (recipe without instantiation) • eigen-vector/value analysis for symmetric and non-symmetric matrix • condition number for arbitrary matrix (Hager algorithm) • many decomposition classes: LU, Chol, QRH, SVD • each allowing repeated solutions without decomposing again • thread safe • persistency
More tests and benchmarks • Like for the linear Algebra classes, similar test suites and benchmarks should be implemented for: • Basic algorithms (TMath like) • Statistical Analysis and probabilities • Functions: integrals, derivatives, root-finding • Interpolations, approximations. • Random numbers: basic, functions, histograms • Physics vectors • Minimization algorithms
To allow the multi-PB of experiment data and associated meta data to be stored in a distributed and Grid enabled fashion various types of data of different volumes (event data, physics and detector simulation, detector data and bookkeeping data) Hybrid technology approach, combining C++ object streaming technology, such as Root I/O, for the bulk data transactional safe Relational Database (RDBMS) services, such as MySQL, for catalogs, collections and meta data In particular, it provides Persistency for C++ transient objects Transparent navigation from one object across file and technology boundaries Integrated with a external File Catalog to keep track of the file physical location, allowing files to be moved or replicated POOL Objectives (Dirk’s slide) Source of problems And misunderstanding Two catalogs ?
POOL Objectives • Hybrid technology approach, combining • C++ object streaming technology, such as Root I/O, for the bulk data • transactional safe Relational Database (RDBMS) services, such as MySQL, for catalogs, collections and meta data If an alternative solution is in mind, it must be a complete solution. In particular, an automatic schema evolution algorithm has to be part of POOL itself. An alternative solution prevents exploiting more features of the current back-end Concentrating on one back-end will eliminate unnecessary overheads and duplicated code. It is urgent to come back to the blueprint objective Combining ROOT as an event store with a RDBMS-based catalog
POOL Objectives • Hybrid technology approach, combining • C++ object streaming technology, such as Root I/O, for the bulk data • transactional safe Relational Database (RDBMS) services, such as MySQL, for catalogs, collections and meta data ROOT I/O is much more than a simple object streaming technology. -It supports automatic schema evolution (a large fraction of the code) -It supports collections (directories of keys, Trees with containers appropriate for queries in interactive analysis). -It supports “object-streaming” with sockets, shared-memory. -It supports access to remote files and is GRID-aware -Collections are designed to work in a parallel/GRID setup with PROOF
POOL libraries size and dependencies Reflection 2.40MB SealKernel 1.62MB PluginManager 1.28MB SealBase 6.60MB ReflectionBuilder 1.02MB Seal 12.92 Pool 6.54 Root 9.04 Tot 28.72 AttributeList 0.15MB PoolCore 0.43MB FileCatalog 0.13MB StorageSvc 1.97MB libCore 6.40MB libCint 1.40MB PersistencySvc 0.29MB EDGCatalog 3.46MB libTree 1.24MB RootStorageSvc 2.22MB DataSvc 0.21MB Collection 0.96MB RootCollection 0.18MB