830 likes | 841 Views
Evolution of data access and analysis with the ROOT system from HEPVIS 2001 by René Brun at CERN. Explore the history, solutions, models, and evolution in data access, analysis, and user interfaces.
E N D
Evolution of Data Access & Analysiswith the ROOT system HEPVIS 2001 René Brun CERN http://root.cern.ch HEPVIS 2001
Fortran90 ??? Project History C++, Commercial Software • Jan 95: Thinking/writing/rewriting/??? • November 95: Public seminar, show Root 0.5 • Spring 96: decision to use CINT • Jan 97: Root version 1.0 • Jan 98: Root version 2.0 • Mar 99: Root version 2.21/08 (1st Intl Root workshop FNAL) • Feb 00: Root version 2.23/12 (2nd Intl Root workshop CERN) • Sep 00: Root version 2.25/03 • Dec 00: Root version 3.00/01 • Jun 01: 3rd International Root workshop at FNAL In 1994, fundamental divergence of opinions in Application Software group in IT. The PAW/Geant3 team is dismantled. C++, Open Source ROOT Evolution in Data Access & Analysis
Wrong Directions Solutions to Object Persistency should have been thought with data analysis in mind. At the contrary, too much emphasis was put on the use of OODBMS systems to store event raw data or results of the first pass reconstruction. A few experts were designing systems to create large data bases. Processing data in the data base was assumed to be an easy task. The official policy was to delegate this function to a set of commercial tools. An unfortunate confusion was made between a data analysis system and a 3-D interactive visualization system. As a result, too much emphasis was put on the pure visualization side of the problem and tools capable of making queries in large OODBMS were not developed. Evolution in Data Access & Analysis
The PAW Model When PAW was created in 1985, histograms were the lowest common denominator for data types. The HBOOK system had been used for many years and was a de-facto standard. The introduction of the so-called row-wise ntuples was immediately successful and called for one more step. Column-Wise Ntuples(CWN) were introduced in 1989 to optimize the access time to larger and larger files. CWNs were very similar to tables in relational data bases. An ad-hoc query mechanism was introduced and was improved multiple times to support more and more complex queries. The data types, however, were pretty simple, integers, floats and characters. The CWNs and histograms were created from data stored in sequential files with systems such as ZEBRA. Evolution in Data Access & Analysis
The User Interface Problem One of the main tools in PAW was the command interpreter KUIP. Many man years had been invested in this package. Designed initially as a simple command line interface, KUIP was further developed to include many features of a programming language with control statements, loops, local and global variables. We had substantially underestimated the effort to develop a coherent system. With a growing number of users, it was clear that all the features of a programming language were requested at the interpreter level. We had seen so many KUIP macros with hundred or even thousands of lines that we were convinced that a modern data analysis system requires all the power of a high level programming language. It was out of question to develop our own language. It was also out of question to use a scripting language different from the main programming language. Because most users were going to write large scripts to analyze large data sets, we were convinced that the scripts had to be written in the same programming language than the main language used in the other stages of data acquisition, simulation and reconstruction Evolution in Data Access & Analysis
A Scripting Language ? With today desktop machines, it takes between a few seconds and one minute to compile and dynamically link a realistic analysis script. With computers becoming faster and faster, one may hope that in a few years from now, dynamic compilation and linking will become affordable for an increasing number of tasks. Having the same language for the interpreted and the compiled codes will be a tremendous advantage. On the other hand, nobody will trust results produced by a pure interpreted language. An interpreted language is fundamental for tasks that must be executed rapidly, such as short scripts edited very frequently or all the tasks called via the graphical user interface Evolution in Data Access & Analysis
GUI Compiled scripts Interpreted scripts With faster computers we hope that this line can be extended downwards Commands Evolution in Data Access & Analysis
The CINT interpreter HEPVIS 2001
Rootcint Preprocessor UserClass1.h UserClass1.h UserCint.C C++ code to create the RTTI Interface for CINT interpreter Streamers UserClass1.h UserClass1.h UserClass1.h UserClass1.h rootcint Evolution in Data Access & Analysis
Any User class library with the RTTI info can be plugged into a ROOT executable and its functions called interactively Evolution in Data Access & Analysis
GUI and Basic Graphics HEPVIS 2001
Three User Interfaces • GUIwindows, buttons, menus • Root Command lineCINT (C++ interpreter) • Macros, applications, libraries (C++ compiler and interpreter) Evolution in Data Access & Analysis
Getting Started with the GUI The ROOT browser The Command line interface Evolution in Data Access & Analysis
Basic widgets Evolution in Data Access & Analysis
GUI User example Example of GUI based on ROOT tools Each element is clickable Evolution in Data Access & Analysis
Graphical user Interface Element Hydrogen has been selected, etc. Evolution in Data Access & Analysis
Example of ROOT-based GUI in a DAQ histogram presenter Evolution in Data Access & Analysis
Thread and GUI Example –Go4 in Action objects in display task lists with asynchronous information from analysis or daq Go4 Prototype objects in analysis task Click here to start analysis Root prompt Graphics canvas Evolution in Data Access & Analysis
The Graphics Event Loop • The ROOT event handler supports: • keyboard interrupts • system signals • X11, Xt, Xm events • Sockets interrupts • Special messages (shared memory, threads..) • Foreign systems (eg Inventor, X3d..) can easily be integrated in the ROOT loop. • ROOT will dispatch the foreign events using Timers. • Signals and Slots like in Qt • Qt and ROOT can work together Evolution in Data Access & Analysis
2-D Graphics • Basic primitives (lines, text, markers, box, polygones, fills) • Graphs, annotators (pave, pavetext, pavelabel) • LateX support (screen and PostScript) • Graphics Editor • Pad Graphics via abstract class TVirtualPad • Basic graphics via abstract class TVirtualX • (TGX11, TGWin32) • TObject::Draw/Paint • TObject::DistancetoPrimitive/ExecuteEvent Evolution in Data Access & Analysis
Object Oriented graphics with embedded picking and actions Evolution in Data Access & Analysis
Full LateXsupport on screen and postscript Formula or diagrams can be edited with the mouse TCurlyArc TCurlyLine TWavyLine and other building blocks for Feynmann diagrams Evolution in Data Access & Analysis
3-D Graphics • Basic primitives • TPolyLine3D, TPolyMarker3D, THelix, TMarker3DBox,TAxis3D • Geant primitives • Support for all Geant3 volumes + a few new volume types • TBRIK,TCONE,TCONS,TCTUB,TELTU,TGTRA,THYPE,TPARA,TPCON, TPGON,TSPHE,TTUBE,TTUBS,TTRAP,TTRD1,TTRD2,TXTRU • Rendering with: • TPad • X3D (very fast. Unix only. Good on networks) • OpenGL • OpenInventor (new addition in 3.01) Evolution in Data Access & Analysis
3-D views CMS CMS in a pad Babar ATLAS Evolution in Data Access & Analysis
ROOT + OpenInventor Open Inventor in ROOT TGFrames Evolution in Data Access & Analysis
ROOT + OpenInventor CMS with Geant3 converted via g2root to ROOT TNodes Evolution in Data Access & Analysis
ROOT and the WEB • An Apache Web-server plug-in module is being developed (more at FNAL workshop). • Provides interactive access to ROOT files, CINT macros and all the graphics. Web pages generated on the fly. • Interesting alternative to PHP using C++ as an embedded scripting system with full access to user classes dynamically Evolution in Data Access & Analysis
Apache plug-in TApache Click here to execute CINT script Click here to browse this ROOT file Evolution in Data Access & Analysis
Apache plug-in TApache TCanvas interface URLs of ROOT files Evolution in Data Access & Analysis
The Histogram Package HEPVIS 2001
The Histogram Classes • Structure Not in PAW/Hbook 1-Dim 2-Dim 3-Dim Evolution in Data Access & Analysis
Random Numbers and Histograms • TH1::FillRandom can be used to randomly fill an histogram using • the contents of an existing TF1 analytic function • another histogram (for all dimensions). • For example the following two statements create and fill an histogram 10000 times with a default gaussian distribution of mean 0 and sigma 1: • TH1F h1("h1","histo from a gaussian",100,-3,3); • h1.FillRandom("gaus",10000); • TH1::GetRandom can be used to return a random number distributed according the contents of an histogram. Evolution in Data Access & Analysis
Fitting Histograms with Minuit • Histograms (1-D,2-D,3-D and Profiles) can be fitted with a user specified function via TH1::Fit. Two Fitting algorithms are supported: Chisquare method and Log Likelihood • The user functions may be of the following types: • standard functions: gaus, landau, expo, poln • combination of standard functions; poln + gaus • A C++ interpreted function or a C++ precompiled function • An option is provided to compute the integral of the function bin by bin instead of simply compute the function value at the center of the bin. • When an histogram is fitted, the resulting function with its parameters is added to the list of functions of this histogram. If the histogram is made persistent, the list of associated functions is also persistent. • One can retrieve the function/fit parameters with calls such as: • Double_t chi2 = myfunc->GetChisquare(); • Double_t par0 = myfunc->GetParameter(0); //value of 1st parameter • Double_t err0 = myfunc->GetParError(0); //error on first parameter Evolution in Data Access & Analysis
Fitting Demo Look at • FittingDemo.C • Unnamed Macro • fitf.C • Named Macro Run FittingDemo.C More info on fitting: http://root.cern.ch/root/html/examples/fit1.C.html http://root.cern.ch/root/html/examples/myfit.C.html http://root.cern.ch/root/html/examples/backsig.C.html Evolution in Data Access & Analysis
Combining Functions y(E) = a1 +a2E + a3E2 + AP( / 2 )/( (E-)2 + (/2)2)background lorenzianPeak par[0] = a1 par[0] = AP par[1] = a2 par[1] = par[2] = a3 par[2] = fitFunction = background (x, par ) + lorenzianPeak (x, &par[3]) par[0] = a1 par[1] = a2 par[2] = a3 par[3] = Ap par[4] = par[5] = Functions with many parameters (> 200) can be minimized. No limitations like in Minuit Evolution in Data Access & Analysis
Drawing Histograms • When you call the Draw method of a histogram for the first time (TH1::Draw), it creates a THistPainter object and saves a pointer to painter as a data member of the histogram. • The THistPainter class specializes in the drawing of histograms. It is separate from the histogram so that one can have histograms without the graphics overhead, for example in a batch program. The choice to give each histogram have its own painter rather than a central singleton painter, allows two histograms to be drawn in two threads without overwriting the painter's values. • When a displayed histogram is filled again you do not have to call the Draw method again. The image is refreshed the next time the pad is updated. • The same histogram can be drawn with different graphics options in different pads. • When a displayed histogram is deleted, its image is automatically removed from the pad. Evolution in Data Access & Analysis
1-D drawing Options Any object in the canvas is clickable and editable Evolution in Data Access & Analysis
1-D drawing Options Simple text or LateX can be typed interactively Evolution in Data Access & Analysis
2-D drawing Options Many different views from the same object concurrently Evolution in Data Access & Analysis
2-D drawing Options Number and type of contours can be changed with the mouse. Each contour is an object Evolution in Data Access & Analysis
2-D drawing Options All these plots can be rotated with the mouse Evolution in Data Access & Analysis
2-D drawing Options Same output on the screen and with vector Postscript Evolution in Data Access & Analysis
ROOT Data Base Approach HEPVIS 2001
Data Bases: model-1 • Put everything in an Object Data base • like Objectivity • Choice of RD45 project • Many experiments initially following this line • Abandonned by most experiments recently • Interesting experience with Babar • Solution not suited for PAW-like analysis • Very likely, the same mistake will be repeated with Oracle 9i Evolution in Data Access & Analysis
Data Bases: model-2 • Put write-once data in an object store • like ROOT in Streamer mode • Use a RDBMS for : • Run/Event catalogs • Geometry, calibrations • eg with ROOT<->Oracle interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/rooObjy/ • or with ROOT <-> Objectivity interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/RDBC/ • Use ROOT split/no-split mode for phys analysis Combining 2 technologies ROOT Oracle Evolution in Data Access & Analysis
ROOT + RDBMS Model ROOT files Oracle MySQL Calibrations Event Store histograms Run/File Catalog Trees Geometries Evolution in Data Access & Analysis
Experience with Root I/O • Many experiments are using Root I/O today or planning to use it in the near future: • RHIC (started last summer) • STAR 100 TB/year + MySQL • PHENIX 100 TB/year + Objy • PHOBOS 50 TB/year + Oracle • BRAHMS 30 TB/year • JLAB (starting this year) • Hall A,B,C, CLASS >100 TB/year • FNAL (starting this year) • CDF 200 TB/year + Oracle • MINOS Evolution in Data Access & Analysis
Experience with Root I/O • DESY • H1 moving from BOS to Root for DSTs and microDSTs 30 TB/year DSTs + Oracle • HERA-b extensive use of Root + RDBMS • HERMES moving to Root • TESLA Test beam facility have decided for Root, expect many TB/year • GSI • HADES Root everywhere + Oracle • PISA • VIRGO > 100 TB/year in 2002 (under discussion) Evolution in Data Access & Analysis
Experience with Root I/O • SLAC • BABAR >5 TB microDSTs, discussing upgrades + Objy • CERN • NA49 > 1 TB microDSTs + MySQL • ALICE MDC1 7 TB, MDC2 23 TB + MySQL • ALICE MDC3 83 TB in 10 days 120 MB/s (DAQ->CASTOR) • NA6i starting • AMS Root + Oracle • ATLAS, CMS test beams • ATLAS,LHCb, Opera evaluating Root I/O against Objy • + several thousand people using Root like PAW Evolution in Data Access & Analysis
ROOT working with Objectivity Objy and ROOT can work together An interactive interface developed by Phenix Evolution in Data Access & Analysis