390 likes | 539 Views
Hybrid Event Store. ATLAS Software Week Database Session. David Adams BNL March 7, 2002. Contents. What does hybrid mean? Files and their contents File Event data object (EDO) Object ID Event Placement category (PC) File, PC and EDO associations File interface. Catalogs
E N D
Hybrid Event Store ATLAS Software Week Database Session David Adams BNL March 7, 2002
Contents • What does hybrid mean? • Files and their contents • File • Event data object (EDO) • Object ID • Event • Placement category (PC) • File, PC and EDO associations • File interface • Catalogs • Reading and writing • Input stream • Output stream • Store view • HES components • Tasks and schedule Hybrid Event Store SW week – DB session
What does “hybrid” mean? • Hybrid merges • Files that manage event data objects (EDO’s) and references to EDO’s with • Relational DB’s used to catalog the files and EDO’s. • Files are self-describing • The data in a file can be traversed without consulting any file catalog. • References between objects in files can be resolved without consulting file catalogs. Hybrid Event Store SW week – DB session
File • File type • HES supports files of different types (formats). • Each file type is responsible for providing its own means to write and read data. • File replica • A file replica contains the same data as the original file. • The replica may be a simple bitwise copy or • It may be of a type different from the original. Hybrid Event Store SW week – DB session
File (cont) • File ID and names • Each file carries • A unique ID • A unique logical name • A locally unique physical name • A file replica carries the ID and logical name of its original. • Normally any replica may be used in place of the original file. Hybrid Event Store SW week – DB session
Event data object (EDO) • What is an EDO? • Collection of data associated with a particular beam crossing (event ID) • Typically a homogenous collection, e.g. tracks, jets or electrons • EDO’s in HES • HES doesn’t care what is in an EDO. • HES provides an interface for file types that • write transient EDO’s to files and • read them back in from files. Hybrid Event Store SW week – DB session
EDO (cont) • EDO ID • Each EDO is assigned a unique ID. • The ID specifies the: • ID of the file that owns the EDO • Event ID • EDO type (and version?) • String key • Any type-key may appear no more than once for any event ID in any file. • An EDO is retrieved from a file with its ID. Hybrid Event Store SW week – DB session
EDO (cont) • EDO reference • A file holds an EDO by reference if it holds a ID for that EDO but does not hold the data. • The file owning an EDO must hold that EDO by value, not just by reference. Hybrid Event Store SW week – DB session
EDO (cont) • EDO replica • A file which does not own an EDO may hold a replica. • The replica has a copy of the EDO data and may be used in place of the original. • The replica carries the same ID as the original. • A reference to an EDO may be satisfied by the file owning the EDO or any file carrying a replica. Hybrid Event Store SW week – DB session
Object ID • Requirement • EDO’s contain objects • Objects in one EDO need to reference those in another EDO • Pointer or reference in the transient world • Solution • HES defines an object ID: • ID of the EDO holding the referenced object plus • Index indicating the location of the referenced object in its EDO Hybrid Event Store SW week – DB session
Object ID (cont) • Size considerations • The EDO ID carries a lot of information and is fairly large (~200 bytes). • There may be very many object ID’s. • EDO’s in files may store a small EDO index in place of the EDO ID. • The index is only valid in the context of the EDO. • Probably 8 bits to allow 512 referenced EDO’s. • Converted to full EDO ID when the object is converted to transient form. Hybrid Event Store SW week – DB session
Event • Events in HES files • Each file holds data for a specified collection of event ID’s. • Each EDO in a file is associated with exactly one event ID. • Add type and key to specify an EDO.. • “Event holds an EDO” means the EDO is associated with the ID of that event. • There are no event objects. Hybrid Event Store SW week – DB session
Placement category • PC’s in events • Each EDO in a file (by value or reference) is associated with a named placement category (PC) in an event. • This is hint to the file that EDO’s in the same PC are likely to be accessed together. • Files can share data at the level of a PC. • Each event in a file is associated with (“holds”) the same collection of PC names (types) Hybrid Event Store SW week – DB session
Placement Category (cont) • PC’s in events (cont) • Each PC holds a collection of EDO ID’s indexed by type-key. • File may choose to organize EDO data by PC. • The union of these type-keys or (EDO ID’s) for all PC’s in an event constitutes the view of the event for that file. • Users may restrict this view to a subset of the PC’s. Hybrid Event Store SW week – DB session
Placement Category (cont) • PC type • Each PC is an instance of a PC type • The type defines • the PC name and • the allowed type-keys • (the type-keys in the PC will be a subset of these) • The file holds the definition of all types that appear in that file • Each event “holds” one PC of each type Hybrid Event Store SW week – DB session
Placement Category (cont) • Sharing categories • The ATLAS DB architecture design distinguishes between “placement categories” and “sharing categories”. • We have merged the two into PC. • This was agreed to at an ANL meeting last October and no objections were raised there or subsequently. • We will go back and make this separation in HES if the need arises. Hybrid Event Store SW week – DB session
Placement Category (cont) • PC references • Any PC in an event in a file may be replaced with a PC reference. • The referenced PC has the same name and event ID as the PC reference. • The referenced PC must be held by value • The file holding the referenced PC must be accessed to construct the view of the event • (To know which type-keys are included.) • Reference may only be satisfied in the original file (no PC replicas). Hybrid Event Store SW week – DB session
File, PC and EDO associations • The following figure illustrates some allowed associations between files, PC’s and EDO’s. • The first event in the first file holds all EDO’s by value. • The second file holds only references. • The first PC holds an EDO by reference. • The second PC is held by reference. • The EDO reference in the second event may be satisfied by the original EDO in the third file or its replica in the first. Hybrid Event Store SW week – DB session
File, PC and EDO associations (cont) Hybrid Event Store SW week – DB session
File interface • The following figure illustrates the file structure implied by the file interface. • Ovals on the right indicate data that can be obtained from the file on the left. • Labels on the line indicate the key required to specify the data. • Blue indicates data which is not specific to an event. • Yellow indicates the collection of event ID’s. • Remaining is data associated with an event ID. Hybrid Event Store SW week – DB session
File interface Hybrid Event Store SW week – DB session
Catalogs • File location catalog • Also known as replica catalog • Enables the user to locate the physical file(s) corresponding to a logical file name • Table at right is crude first pass • Expect this to be implemented in the GRID environment Hybrid Event Store SW week – DB session
Catalogs (cont) • File content catalog • Enables users to locate logical file name based on ID • Enables users to locate logical files based on stream type, event and production • Example at right Hybrid Event Store SW week – DB session
Catalogs (cont) • Stream catalog • Specifies which placement category types are included in which stream types. • Example at right. • PC catalog • Specifies which type-keys are included in which stream types. • Example at right. Hybrid Event Store SW week – DB session
Catalogs (cont) • EDO catalog • Enables users to locate the file holding a particular EDO. • Unlikely this would be created for all data but would be used for subsets such as datasets. • Example at right. • Original EDO ID relevant for regenerated data Hybrid Event Store SW week – DB session
Input stream • Collection of files to define events • All files have same stream type • Stream type = set of PC types • Any event ID appears at most once • Placement categories • Specify which PC’s are accepted or omitted • Next event ID • Can be externally specified • Stream provides means to generate Hybrid Event Store SW week – DB session
Input stream (cont) • Event • The input data for an event in a stream includes all EDO’s in accepted PC’s for the event ID • PC’s and PC references are taken from one file • PC and EDO references can be satisfied in a separate collection of “reference files” • The event cannot be defined (set of type-keys discovered) if any PC’s cannot be found • If an EDO and any of its replicas is not found, the event is defined but the data for that EDO is inaccessible Hybrid Event Store SW week – DB session
Output stream • Type • Each output stream is of a named type which specifies the included PC types • Each event added to the stream will include one PC of each type • Each PC type specifies the allowed type-keys • User (see view) may choose whether or not to write an EDO of allowed type-key Hybrid Event Store SW week – DB session
Output stream (cont) • Files • Output stream includes a series of files to which data is added for each accepted event • Stream has policies for • Deciding when a file is full and opening a new file for the stream • Providing ID’s and logical and physical names for these files Hybrid Event Store SW week – DB session
Store view • Contents • One or more input streams • Collection of files to be used for chasing PC and EDO references • One or more output streams • Event selection • The view can assign the ID for the next event • By iterating over a user-defined list or • Asking one of its streams to make this event Hybrid Event Store SW week – DB session
Store view (cont) • Reading the event • Data extracted using the same event ID for all input streams • The input event is defined as the union of the input events in in each stream • No type-key may be duplicated Hybrid Event Store SW week – DB session
Store view (cont) • Writing the event • User specifies which streams are accepted for each event • Event data is written for all accepted streams • View assigns a stream to own each new EDO that is to be written • View has policy for deciding for each stream: • Whether each PC is written by value or reference • Which EDO’s are written by value • Which EDO’s are written by reference Hybrid Event Store SW week – DB session
HES components Hybrid Event Store SW week – DB session
Tasks and schedule • Plan: • To deliver an initial version of HES that • is sufficient to meet the needs of DC1-2 • and serves as prototype for the LCG common hybrid event store • Attempt a design that can evolve to meet the long-term goals of both ATLAS and the LCG • Cooperate with the LCG • to whatever extent possible in the short term • fully in the long term Hybrid Event Store SW week – DB session
Tasks and schedule (cont) • DC1-2 functionality • HES core • Base (ID’s, PC, …) • File interface • Simple implementation of input and output streams • Simple implementation of view • Athena/StoreGate integration • See talk for EDM meeting • ROOT storage type with HES interface • Sufficient cataloging Hybrid Event Store SW week – DB session
Tasks and schedule (cont) • First release • Deliver June 1, 2002 • In time for users to test and discover any design flaws well in advance of DC1-2 • Effort required is 20 FTE-weeks • Plus testing • 2X contingency implied by work thus far Hybrid Event Store SW week – DB session
Tasks and schedule (cont) • Completed to date: • Design sufficient to begin the first implementation • See the HES page at http://www.usatlas.bnl.gov/~dladams/hybrid • HES ID’s, placement category and file interface have been implemented (see HES page) • ROOT persistency (but not HES interface) is far along Hybrid Event Store SW week – DB session
Tasks and schedule (cont) • Resources • BNL PAS group will is focusing on HES core and ROOT • Outside help is welcome • Need allocation of priority (and volunteers) to implement Athena/SG integration • BNL can provide some of this • Cataloging (RDB) needs to be better understood • Again BNL would like to involved but expects to share the effort Hybrid Event Store SW week – DB session