1 / 8

Navigation and References / DataHeader

Navigation and References / DataHeader. Less than I hoped to have Peter Van Gemmeren (Argonne National Laboratory (US)). References. POOL uses string-able Token as reference to persistent objects. Token contain: [DB =< uuid_string >] [CNT=<string = “Tree[(Branch)]”>] [CLID=< uuid >]

lavina
Download Presentation

Navigation and References / DataHeader

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Navigation and References / DataHeader Less than I hoped to have Peter Van Gemmeren (Argonne National Laboratory (US))

  2. References • POOL uses string-able Token as reference to persistent objects. • Token contain: • [DB=<uuid_string>] • [CNT=<string = “Tree[(Branch)]”>] • [CLID=<uuid>] • [TECH=00000000] • [OID=00000000 = “container id”-00000000 = “entry number”] • Internally, POOL/APR uses short Token: • Oid1: an index into the ##Links table. • Oid2: An index into the Container. • When POOL follows an internal reference to obtain the token it will use the first index to get the string from ##Links table and oid2 from the second index.

  3. External • POOL Tokens can easily be externalized as string. • [DB=5235CC95-1886-DF11-9711-001E4F3E5C33][CNT=CollectionTree(EventInfo_p3 /McEventInfo)][CLID=3E240CA8-5124-405B-9059-FAFC4C5954C6][TECH=00000202] [OID=00000003-00000012] • Will increase size from 2 x 4 Bytes to over 100 Bytes. • Athena never really used ##Links table: • Pool hides it. • File MetaData cam much later than use of references. • Use Athena PoolTokenAddress to store Token struct values directly (without string translation): • Optimization for reading objects from POOL (where one needs a Token struct) • Derived from GenericAddress: • Plus POOL::Token struct and some functions to allow on demand string generation • One could almost use GenericAddress (tech, clid, 3 strings, 2 longs) directly, but: • Currently, one string is taken for StoreGate key • Would have to store uuids (file and class id) as string.

  4. Relocation • Objects cannot easily be relocated, typically DB uuid and oid2 need to be updated or redirected. • Can be done in POOL using ##Sections table • [CNT=CollectionTree(EventInfo_p3/McEventInfo)][OFF=0][START=0][LEN=20] • Stores the entry range of the section and the offset into the ##Links table • POOL relocation in ATLAS mainly used for fast/hybrid merge. • But there may be potential for more.

  5. DataHeader • ATLAS software does not require overall event data organization in the transient store. • Writing an event means persistifying a list of data objects and their StoreGate state. • A DataHeader serves as generic entry point to event data: • As each data object is written, its persistent location and StoreGate state are recorded in a DataHeaderElement that is added to the DataHeader. Class Diagrams for DataHeader and DataHeaderElement: The DataHeader and DataHeaderElement classes provide the layout to save the StoreGate state and the persistent address for all data objects.

  6. POOL dependency of the DataHeader • As discussed yesterday, DataHeader only depends on POOL for optimization. • Could be removed, • But when pulling POOL Guid, Token (and Placement) from APR to Athena this dependency will resolve even with optimization.

  7. DataHeader size / speed • DataHeader stores an independent Token for all DataObjects and every event. • Very, very flexible, maybe too much so. • Typically large amount of duplication: • File guid, type, container, key and StoreGate change do not typically change event to event’ • File guid and technology tend to be the same for all container in the stream. • Except provenance • Some of the duplicated information is filtered out by in persistency, by using DataHeaderForm, which is read only if needed. • Still written for each event. • That and having 300 – 500 collections, makes the DataHeader slow to read, for jobs that only access few DataObjects. • Those %#%%^# missing ET guys…

  8. Hierarchical / Satellite DataHeader • Additional DataHeader that only contain elements for a (small) subset of collections in the event: • Smaller • Faster • Incomplete • No provenance info, other than pointer to full DataHeader. • Use BackNavigation mechanism to allow retrieves of collections not included on the satellite. • Requires reading a full DataHeader • No more speed up • Should be rare • Currently the is only one satellite: • Most ‘basic’, containing only EventInfo • Great speed up when e.g. creating simple TAGs.

More Related