210 likes | 333 Views
A Uniform and Coherent Approach to Object Persistency. Vincenzo Innocente. User Tag (N-tuple). Tracker Alignment. Ecal calibration. Tracks. Event Collection. Collection Data. Electrons. Event. HEP Data. Environmental data Detector and Accelerator status Calibrations, Alignments
E N D
A Uniform and Coherent Approachto Object Persistency Vincenzo Innocente
User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event Collection Collection Data Electrons Event HEP Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments • Event-Collection Data (luminosity, selection criteria, …) • … • Event Data, User Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms Software Strategy
Later selected DAQ Not in original design Later more filters to DVNs and Ntpule Software Strategy
CMS Experiment-Data Analysis Quasi-online Reconstruction Environmental data Detector Control Online Monitoring store Request part of event Store rec-Obj Request part of event Event Filter Object Formatter Request part of event store Persistent Object Store Manager Object Database Management System Store rec-Obj and calibrations store Request part of event Data Quality Calibrations Group Analysis Simulation G3or G4 User Analysis on demand Software Strategy
Uniform approach • Coherent data access model • same mechanisms, same language, same transaction model • Save effort • A single team of experts • A single team of administrators • Leverage experience • developers can easily move from one application to another (from event-data to calibration-data applications) • Reuse design and code • Basic requirements are often the same • We can use the same code to manage event data, calibrations, “n-tuple” • Main road in producing better and higher quality software Software Strategy
Reconstruction Sources Software Strategy
Algorithm Algorithm Algorithm Rec Objs Rec Objs Rec Objs CMS Reconstruction Model Geometry Conditions Sim Hits Raw Data Detector Element Event Digis Rec Hits Algorithm Software Strategy
Vector of Digi Vector of Digi Index Raw Event RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. RawEvent ReadOut ReadOut ... RawData RawData Index implemented as an ordered vector of pairs Software Strategy
Reconstruction Object Model All persistent objects are managed by CARF. Physics Modules access them through standard C++ pointers Software Strategy
CMS Reconstructed Objects Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. RecEvent A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access patterns (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (RHits) are cached by value into the final objects . S-Track Reconstructor “esd” Track SecInfo “rec” S Track .. Track Constituents “aod” Vector of RHits S Track Software Strategy
CARF2000 Event Structure Software Strategy
CMS Event Structure Persistent Event Collection Event Collection Transient Run RecEvent RecEvent In case of re-reconstruction the original structure is kept. Event objects are cloned and new collections created RawEvent RecEvent RecEvent Software Strategy
Physical clustering Software Strategy
CMS needs a real DBMS • An experiment lasting 20 years can not rely just on ASCII files and file systems for its production bookkeeping, “condition” database, etc. • Even today at LEP, the management of all real and simulated data-sets (from raw-data to n-tuples) is a major enterprise • Multiple models used (DST, N-tuple, HEPDB, FATMAN, ASCII) • A DBMS is the modern answer to such a problem • An ODBMS providesa coherent and scalable solution for managing all kind of data • seamless integration with OO languages • internal navigation capability Software Strategy
CMS Experience • CMS has used Objectivity/DB for the current prototype activity in close contact with IT in the context of the RD45 project • Database Developers (just OO and C++) : • Designing and implementing persistent classes not harder than for native C++ classes. • Physics Software Developers (do not see Objectivity) : • Persistent objects are accessed using standard C++ • Same code can access either persistent or transient object • Framework (easy to manage DB) : • Flexible and transparent distinction between logical associations and physical clustering. • Fully transparent I/O with performances essentially limited by the disk speed (random access). Software Strategy
CMS Experience • Administration (essentially file management) : • Very flexible file-level management (localization, archival, replication) using AMS features • Several tools available to monitor activities and performance • File size overhead (5% for realistic CMS object sizes) not larger than for other “products” • Physicists (easy to use) : • Personal Databases are invaluable and in common use • Analysis performance and flexibility improved by shallow (link) & deep (data) local copy of selected event sample • use same type of event-catalog as production • Framework and CMS tools hide all details • All our tests show that Objectivity/DB can satisfy CMS requirements in terms of performance, scalability and flexibility for all kind of data Software Strategy
Alternatives: other ODBMS • Versant is a viable commercial alternative to Objectivity • do we have time to build an effective partnership (eg. MSS interface)? • Espresso (by IT/DB) should be able to produce a fully fledged ODBMS in a couple of years once the proof-of-concept prototype is ready • Migrate CARF from Objectivity to another ODBMS • We expect that it would take about one year • Will not affect the basic principles of CMS software architecture and data model • Will involve only the core CARF development team. • Will not disrupt production and physics analysis Software Strategy
Alternatives: ORDBMS • ORDBMS (Relational DB with OO interface) are appearing on the market Up to now they looked targeted to those who have already a relational system and wish to make a transition to OO • A New ORACLE product has all the appearances of a fully fledged ODBMS • IT/DB is in the process of evaluating this new product as an event store If it will look promising CMS will join this evaluation next year. • We will consider the impact of ORDBMS on CMS Data Model and on migration effort before the end of 2001 Software Strategy
Fallback Solution: Hybrid Models • We believe that this solution could seriously compromise our ability to perform our physics program competitively • (R)DBMS for Event Catalog, Calibration, etc • Object-Stream files for event data • Ad-hoc networked data-server and MSS interface • Less flexible • Rigid split between DBMS and event data • One way navigation from DBMS to event data • More complex • Two different I/O systems • More effort to learn • More resources for developing and maintaining our application software • This approach will be used by several experiment at BNL and FermiLab (RDBMS not directly accessible from user applications) • CMS is following closely these experiences. Software Strategy
Conclusion • CMS has chosen to follow a uniform and coherent approach for the development of Experiment-Data Analysis Software • Today a Functional Prototype exists and includes • A modular Object Oriented Framework • A Service and Utility Toolkit • A Persistent Object Service based on Objectivity/DB • Specialized applications for DAQ, Simulation, Reconstruction and Visualization • A set of plug-in modules for detector and physics simulation, reconstruction and analysis • CMS is currently reviewing the present architecture, the software design and the technical choices to prepare for next software development cycle Software Strategy