280 likes | 370 Views
Future of Analysis Environments Personal views. Rene Brun CERN. Type of data ? Any type ? PAW-like ntuple?. No restrictions. Data. Restricted to histogramming & visualisation ?. Analysis. Structure ? What is modularity? Abstract interfaces? Languages? Parallelism?. Coherent
E N D
Future of Analysis EnvironmentsPersonal views Rene Brun CERN Rene Brun
Type of data ? Any type ? PAW-like ntuple? No restrictions Data Restricted to histogramming & visualisation ? Analysis Structure ? What is modularity? Abstract interfaces? Languages? Parallelism? Coherent Framework of Cooperating systems I/O + UI Object Bus Packages Rene Brun
Type of Data in the past • Event data managed by data structure (bank) managers (zebra, bos..) • a bank is like an object • Final physics data in ntuple format (paw) • ntuple is like a table in a RDBMS • Run/File catalog with adhoc tools (fatmen) • calibrations, geometry, etc, adhoc tools (hepdb) Rene Brun
Type of data: trends-1 • Put everything in an Object Data base • like Objectivity • Choice of RD45 project • Many experiments initially following this line • Abandonned by most experiments recently • Interesting experience with Babar • Solution not suited for PAW-like analysis Rene Brun
Type of data: trends-2 • Put write-once data in an object store • like ROOT in Streamer mode • Use a RDBMS for : • Run/Event catalogs • Geometry, calibrations • eg with ROOT<->Oracle interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/rooObjy/ • or with ROOT <-> Objectivity interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/RDBC/ • Use ROOT split/no-split mode for phys analysis Rene Brun
Framework basic requirements • Dynamic Linking AND Unlinking of user shared libs • User can define new classes interactively • Interpreted code can call compiled code • Compiled code can call interpreted code • Scripts can be dynamically compiled/linked This is the normal operation mode Interesting feature for GUIs & event displays Script Compiler Root >.x file.C++ Rene Brun
Fundamental features of an Object-Oriented Framework OO World Procedural World Persistency services Data DDL Data RTTI Functions Functions KUIP CDF User Interface C++ ROOT C++ Java Rene Brun
Automatic Code generation Algorithms Meta information Automatically generated code 40 per cent in ROOT Hand-written code Used by I/O, GUI, Inspectors, browsers interpreter, html, etc Rene Brun
Java - ROOT interface(s) • Read ROOT files from a java program • see Tony Johnson • will be simpler with new ROOT 2.26 supporting automatic schema evolution • Call ROOT classes from a java program • work by Subir Sarkar (hand-coded JNI interface) • could use JACO (see Tony Johnson) • or better use a variant of rootcint (rootjava) • Generate ROOT-Java data classes • TTree::MakeJava like TTree::MakeClass Rene Brun
Java - ROOT interface (s) import root.*; TROOT troot = new TROOT("simple", "Simple Java to root interface"); TApplication app = new TApplication("ROOT Apllication"); System.out.println("TApplication ....."); TBenchmark bench = new TBenchmark(); bench.Start("Hsum"); TRandom random = new TRandom(); TH1F total = new TH1F("total","total distribution",100,-4.0F,4.0F); TH1F main = new TH1F("main","Main contributor",100,-4.0F,4.0F); TH1F s1 = new TH1F("s1","first signal",100,-4.0F,4.0F); TH1F s2 = new TH1F("s2","second signal",100,-4.0F,4.0F); total.Sumw2(); // this makes sure that the sum of squares of weights will be stored total.SetMarkerStyle(21); total.SetMarkerSize(0.7F); main.SetFillColor(16); s1.SetFillColor(42); s2.SetFillColor(46); TCanvas canvas = new TCanvas("c1","The HSUM example",200,10,600,400); canvas.SetGrid(); and so on. Rene Brun
Java - ROOT interface (s) • It is important to cooperate to: • facilitate the Java/C++ integration • Could be interesting for applications where performance is not an issue (event display) • However, I do not believe in a solution where the bulk of data is stored as C++ objects and analyzed with a Java-based system. • It must fun but very inefficient • what do you gain? Rene Brun
Languages for data analysis • Data analysis requires an efficient access to objects (both data and functions). • It requires a powerful programming language: • in interpreted mode • in compiled mode • Transition from interpreted mode to compiled mode must be smooth and transparent. • A scripting language is not the solution • Python is not a solution Rene Brun
GUI Compiled scripts Interpreted scripts Commands Rene Brun
A role for commercial components ? • Data bases • Oracle very likely, others NO • Graphics/UI • NO • but YES for interfaces to commercial systems • Special algorithms like fitting • strong doubts • I strongly believe in the advantages of • Open Source systems • Large news/discussions groups Rene Brun
Our current work • Continuous consolidation of the system • Automatic schema evolution • Common GUI between Unix and Windows • Upgrade UI to new style GUI • Tree query processor reimplemented using the new TSelector facility. • PROOF (Parallel ROOT Facility) (see next) • Interface with other systems, eg G3, G4 • Support thousands of users Rene Brun
The OODBMS dreams Selection Parameters CPU Local DB1 Federation DB2 DB3 OODB Remote DB4 DB5 DB6 Rene Brun
ROOT/PROOF and GRIDs Selection Parameters TagDB CPU Procedure PROOF Local DB1 RDB CPU Proc.C DB2 Remote Proc.C DB3 CPU Proc.C DB4 Proc.C CPU DB5 Proc.C CPU DB6 CPU Rene Brun
What is a modular system ? • Modularity is a nice word. • Everybody claims to be modular. • a system with many small and independent modules? • where is the object bus? • what is the cost of assembling all the pieces in a real application? • a hierarchical system with easily replaceable components? • but with many internal dependencies Rene Brun
What is a modular system ? • a system with well defined interfaces? • where is the object bus? • passing data by reference or value? Collections/Folders? • a system easy to understand (user view) ? • end users like monolithic systems doing everything • a system easy to maintain (developer view) ? • a system that can easily be integrated into other systems? • a theoretical system and no implementation? Modularity is difficult to achieve in a growing system. Rene Brun
Modularity and Dependencies in ROOT By dependency, we mean binary dependency, when one module (shared library) forces the loading of another library. In the past this was a weak point of the system. For example, if you wanted to produce in a batch program some histograms you were required to link your app with all ROOT graphics libs up to X11. Like with PAW This problem was rightly pointed out by many users as something to be fixed. We did this. In the current system only a small set of base libraries are needed when creating e.g. histograms, in batch mode. Besides the decoupling of the graphics system many more abstract layers were introduced to decouple other parts of the system: histogram from its painter, the tree storage system from its query mechanism (treeplayer), fitting from minuit, etc. Following this reorganization none of the lower level libraries depend anymore on higher level libraries. These changes improved besides modularity also overal system performance. Rene Brun
ROOT Quality assurance Rene Brun
A growing users base Rene Brun
Summary • We are implementing a powerful system designed for large scale data analysis with parallel architectures in a GRID context. • The ROOT system is a framework providing a coherent object bus in DAQs, simulation, reconstruction and analysis phases. • We have learnt a lot in the past 5 years, also following our 10 years of experience with PAW. • Developing the system and at the same time supporting a rapidly growing users base is a demanding but also rewarding job. Rene Brun