80 likes | 162 Views
Navigation and References / DataHeader. Less than I hoped to have Peter Van Gemmeren (Argonne National Laboratory (US)). References. POOL uses string-able Token as reference to persistent objects. Token contain: [DB =< uuid_string >] [CNT=<string = “Tree[(Branch)]”>] [CLID=< uuid >]
E N D
Navigation and References / DataHeader Less than I hoped to have Peter Van Gemmeren (Argonne National Laboratory (US))
References • POOL uses string-able Token as reference to persistent objects. • Token contain: • [DB=<uuid_string>] • [CNT=<string = “Tree[(Branch)]”>] • [CLID=<uuid>] • [TECH=00000000] • [OID=00000000 = “container id”-00000000 = “entry number”] • Internally, POOL/APR uses short Token: • Oid1: an index into the ##Links table. • Oid2: An index into the Container. • When POOL follows an internal reference to obtain the token it will use the first index to get the string from ##Links table and oid2 from the second index.
External • POOL Tokens can easily be externalized as string. • [DB=5235CC95-1886-DF11-9711-001E4F3E5C33][CNT=CollectionTree(EventInfo_p3 /McEventInfo)][CLID=3E240CA8-5124-405B-9059-FAFC4C5954C6][TECH=00000202] [OID=00000003-00000012] • Will increase size from 2 x 4 Bytes to over 100 Bytes. • Athena never really used ##Links table: • Pool hides it. • File MetaData cam much later than use of references. • Use Athena PoolTokenAddress to store Token struct values directly (without string translation): • Optimization for reading objects from POOL (where one needs a Token struct) • Derived from GenericAddress: • Plus POOL::Token struct and some functions to allow on demand string generation • One could almost use GenericAddress (tech, clid, 3 strings, 2 longs) directly, but: • Currently, one string is taken for StoreGate key • Would have to store uuids (file and class id) as string.
Relocation • Objects cannot easily be relocated, typically DB uuid and oid2 need to be updated or redirected. • Can be done in POOL using ##Sections table • [CNT=CollectionTree(EventInfo_p3/McEventInfo)][OFF=0][START=0][LEN=20] • Stores the entry range of the section and the offset into the ##Links table • POOL relocation in ATLAS mainly used for fast/hybrid merge. • But there may be potential for more.
DataHeader • ATLAS software does not require overall event data organization in the transient store. • Writing an event means persistifying a list of data objects and their StoreGate state. • A DataHeader serves as generic entry point to event data: • As each data object is written, its persistent location and StoreGate state are recorded in a DataHeaderElement that is added to the DataHeader. Class Diagrams for DataHeader and DataHeaderElement: The DataHeader and DataHeaderElement classes provide the layout to save the StoreGate state and the persistent address for all data objects.
POOL dependency of the DataHeader • As discussed yesterday, DataHeader only depends on POOL for optimization. • Could be removed, • But when pulling POOL Guid, Token (and Placement) from APR to Athena this dependency will resolve even with optimization.
DataHeader size / speed • DataHeader stores an independent Token for all DataObjects and every event. • Very, very flexible, maybe too much so. • Typically large amount of duplication: • File guid, type, container, key and StoreGate change do not typically change event to event’ • File guid and technology tend to be the same for all container in the stream. • Except provenance • Some of the duplicated information is filtered out by in persistency, by using DataHeaderForm, which is read only if needed. • Still written for each event. • That and having 300 – 500 collections, makes the DataHeader slow to read, for jobs that only access few DataObjects. • Those %#%%^# missing ET guys…
Hierarchical / Satellite DataHeader • Additional DataHeader that only contain elements for a (small) subset of collections in the event: • Smaller • Faster • Incomplete • No provenance info, other than pointer to full DataHeader. • Use BackNavigation mechanism to allow retrieves of collections not included on the satellite. • Requires reading a full DataHeader • No more speed up • Should be rare • Currently the is only one satellite: • Most ‘basic’, containing only EventInfo • Great speed up when e.g. creating simple TAGs.