1 / 11

Navigation Requirements

This solution addresses the need to locate persistent objects even if they are relocated or multiply located in CMS data management. It includes features such as scattering and gathering of data, event data, calibration, conditions, and geometry metadata, and ensures full event navigability at every stage of analysis. The proposed object identifier system allows for effective navigation and resolution of data product IDs.

glamas
Download Presentation

Navigation Requirements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Navigation Requirements CMS View of OIDs and Refs Vincenzo Innocente Lassi Tuura CMS Persistency RTAG

  2. Driving Requirements • Ability to locate a persistent object even if relocated or multiply located • Scattering: write in one file, relocate to many • Gathering: collect in one file interesting objects • Solution should possibly address all kind of data • Event Data • Calibrations, Conditions, Geometry • MetaData themselves Persistency RTAG

  3. Scenarios • Production (simplify it) • Write all data from a single process in a single file • Split later the data according to a clustering strategy • Access (transfer as little as possible) • Select and reprocess events from a large sample • Get a local persistent copy of just what needed • Ensure full event navigability at every stage of analysis • Typical query • Give me the closest (actually in the fastest way) collection of tracks compatible with this configuration belonging to the events satisfying these criteria Persistency RTAG

  4. Event Model • (Event) Data Product (100-1000 per event) • chunk of (event-) data managed as a single unit • Collection of Digis belonging to a part of a detector • Collection of RecObj (track, calo-clusters, jets) produced by a given algorithm • Currently identified by (its objy OID and federaton) • Event • ascii string • “metadata” describing how it was produced • Its “transient” type • Physical location • Inter Data Product dependency tracked • Consistency ensured Persistency RTAG

  5. Production 2002, Complexity Persistency RTAG

  6. User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event Collection Collection Meta-Data Electrons Event HEP Data • Event-Collection Meta-Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments (luminosity, selection criteria, …) • … • Event Data, User Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms Persistency RTAG

  7. Id-2 Id-1 RecEvent RecEvent Id-1 Id-1 DigiEvent DigiEvent (Partial) Re-reconstruction Re-Reconstruction & Clones Production User Run and Config Run and Config. Id-2 Tracker Ecal Local Replica Ecal Hcal Hcal Persistency RTAG

  8. CMS Reconstructed Objects Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. RecEvent calibration dependent A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access patterns (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (RHits) are cached by value into the final objects . Possible to Recalibrate aod (and generate a new “version” without modify or copy the esd and rec) CPU intensive S-Track Reconstructor “esd” Track SecInfo “rec” S Track .. Track Constituents “aod” Vector of RHits S Track Persistency RTAG

  9. Vector of Digi Vector of Digi Index Raw Event RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. RawEvent ReadOut ReadOut ... RawData RawData Index implemented as an ordered vector of pairs Persistency RTAG

  10. A Oid proposal • We propose to use an object identifier composed of three fields: • Navigation-Scope (Sea??) identifier • Always implicit: explicit use limited to cross reference among disjoint stores (for instance event toward calibration) • Nothing prevents to use a context for a dataset or even an event • Concrete implementation of the Sea is a file catalog • Data Product Id (dp-id) • Unique and immutable identifier (in a given sea) of a data product • To simplify lookup in case of scattering-gathering we suggest it includes a field identifying the logical-file (lf-id) • In writing one can easily stream all logical-files into the same physical file • For a given sea a physical file can map to multiple logical-files • In small seas (lakes) (such as a local replica of selected events) even a m-to-n mapping could be affordable • Object index • Used to identify single objects in the data product • If the Data product is WORM indexing will work whatever data structure is used below a Data Product Persistency RTAG

  11. Data Product id resolution A possible implementation Sea is responsible for mapping a lf-id to a given strategy to resolve a data-product-id • The same dp-id can be resolved differently depending in which sea we are navigating • Physical resolution strategy • Lf-id identifies a file, the rest of the dp-ida physical location (objyl ike) • Local mapping • Lf-id identifies a file (not necessarily the original one) the rest of the dp-id is used to look-up in a table contained in the file itself • Global mapping • Lf-id identifies a table, the rest of the dp-id is used to look-up in the table for the physical location of the data-product • …. Persistency RTAG

More Related