160 likes | 353 Views
Persistent User Data using Objectivity . Vincenzo Innocente CERN/EP/CMC. The missing Milestone. Introduction. Last RD45 milestone was about private persistent data and classes
E N D
Persistent User Datausing Objectivity Vincenzo Innocente CERN/EP/CMC The missing Milestone
Introduction • Last RD45 milestone was about private persistent data and classes • Although a model was developed and prescriptions provided there was no evidence that it would have worked in a HEP-experiment production environment • In CMS, following and extending the RD45 model, we have developed procedures which allows any physicist • to develop and test private persistent classes • to manage its own private persistent objects User Collections
User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event Collection Collection Meta-Data Electrons Event HEP Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments • Event-Collection Meta-Data (luminosity, selection criteria, …) • … • Event Data, User Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms User Collections
Requirements • Software Development: • Physics reconstruction developers should be able to develop, test and integrate persistent classes without interferer with other developments (same as for transient classes) • “End Users” should be able to develop and use private persistent classes • Data: • Physicists (“End Users”) should be able to access any kind of data without interfering with its production • Physicists should be able to populate private databases, using and referencing “common objects”, without interfering with production activities • Environment: • Development and running environment should be the same for system (experiment-wide) and user data • Access mechanisms should be the same for system and user data User Collections
Technical Solutions • FD-Shallow-copy: A “federation shallow-copy” is a local copy of .boot and .FDDB ooinstalled -nocatalog with all original database files made read-only • Development • Named schema (few: 5 or so) are used to avoid interferences and ease integration • Development and tests are performed against fd-shallow-copy • Schema is exchanged using ooschemadump/upgrade • Standard scripts (today making use of SCRAM, tomorrow integrated into SCRAM) are provided to parse ddl • A rich middle-ware of C++ classes, often template, is provided to reduce (to zero?) the Objectivity-specific code to be known by physicists • In particular a user development environment is provided to develop “concrete-Tags” of simple structure User Collections
Technical Solutions • Object shallow-copy Local copy with (one-way-)references to constituents • Object deep-copy Local copy with local copy of constituents • Data: • Users always start with a local federation-shallow-copy • Events are never modified in place: reconstruction always generate a new event collection and a new event-data structure with a shallow copy of the parent event • Users can produce deep copy of (part of) the event for a selected sample and generate a “user collection” • Concrete Tags (user private persistent objects) can be added to a user collection User Collections
Navigation • Top Level: • User sees and navigates a Unix-like tree structure through a C++ or Python API (Shell) • Implementation is by Objy naming (root is a database system name) or any other object-containment mechanism mapped to a Unix-like tree by the “Shell” • Collections • We use a fully hirarchical composite collection system with metadata associated to each component • It allows sequential and random access with full support for fast user selection on MetaData • It can be used to organize any kind of objects that need indexing but slow update • Event • Navigation in the event structure and from the event to the configuration is implemented using one-way references (pure ooRefs) User Collections
Owner Name DataSet Name Dataset Collection MetaData & “User Tag” “Run” Collection Rec Event User Collections
User Collection “By Reference” MetaData & “User Tag” DB Name (physical location) Context Name Collection Name “Run” Collection User Collections are populated by User Filters Multiple User Filters (each populating a different User Collection) are allowed in a single ORCA job Original RecEvent User Collections
RecApplication I/O Federation Datset Collection or User Collection Histograms & Tags Create/extend User Collections Append new Run to a Dataset Store RecReader Request Output Run is a new event collection containing new “data” (digis & RecObjs) and reference to or replica of input data Output User Collections are unmodified sub-samples of the input collection User Collections
EvId RecEvent EvId DigiEvent SimEvent SimEvent SimEvent SimEvent Top Level Event Structure (COBRA5) Run Crossing Trigger Pile-up SimEvent User Collections
Vector of Digi Vector of Digi Index Raw Event RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. RawEvent ReadOut ReadOut ... RawData RawData Index implemented as an ordered vector of pairs User Collections
CMS Reconstructed Objects Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. RecEvent A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access patterns (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (RHits) are cached by value into the final objects . S-Track Reconstructor “esd” Track SecInfo “rec” S Track .. Track Constituents “aod” Vector of RHits S Track User Collections
Id-2 Id-1 RecEvent RecEvent Id-1 Id-2 DigiEvent DigiEvent Id-0 SimEvent SimEvent SimEvent SimEvent SimEvent Re-Reconstruction & Clones Run Run Id-1 Local Replica Crossing Trigger Pile-up User Collections
Collection “By Value” MetaData & “User Tag” New Owner Name DataSet Name Run Collection New RecEvent with new or cloned Digis & RecObjs User Collections
Physical clustering User Collections