130 likes | 136 Views
Learn how using an object-relational model can solve the dependency issues in event data, leading to shorter compilation and link times, and improved code maintenance.
E N D
The Event as an Object-Relational Database: Avoiding the Dependency Nightmare Christopher D. Jones Cornell University, USA
Introduction • Data within an event often relate to one another • E.g., tracks matching to EM showers • Simple object-oriented design has these data items containing pointers to one another • Unfortunately, this causes serious dependency issues • Large compilation times • Extremely long link times • Broken code affects more systems • Using an object relational model avoids these problems and allows new possibilities C. Jones CHEP03
Object Oriented Approach • Design • Related data are grouped into a class • E.g., Track, EM Shower • Data values are stored in objects of the appropriate class • Links between objects are accomplished by embedding pointers into the objects • Appeal • Easy for users to navigate the relationships between objects C. Jones CHEP03
Object Oriented Approach Track Hit EM Shower Hit EM Shower Hit Hit EM Shower Hit Track Hit Hit Hit C. Jones CHEP03
Problems With OOA: Interface • Adding new relationships means changing the classes • Must recompile all code that uses those classes • Published objects must be mutable • Need to be able to change object to set relation to other object • How to handle links in the case where multiple algorithms produce the same data? • E.g., tracks from different track-finders to same EM showers • Where to put the data that describes the relationship? • How do two people refer to the same object if each has made a sub-selection of a list? • Use the index in the original list? C. Jones CHEP03
Problem with OOA: Compile & Link • In highly coupled systems, if one piece of code is broken the whole system can break • E.g., if tracking is broken may not be able to do EM shower work • To avoid excess compilation dependency you must only forward declare data in header files • To avoid excess linking dependencies, associated objects can not internally access member functions of each other • E.g., can not have function that calculates energy of EM shower divided by momentum of track • Can relax this requirement if you organize your code so that the associated routines are in a separate object file • Reference counting smart pointers cause strong compile- and link-time dependencies C. Jones CHEP03
Problems with OOA: Storage • Direct references in objects complicates storage • Need to convert pointers to/from persistent values • If using bidirectional links must construct both objects before linking them • Often causes developer to couple objects directly to storage system • Reading/writing causes compile/link/runtime dependencies • Occurs even if object only holds pointers to other objects • Can avoid dependencies by reading back unlinked objects • User must specify when to make link • Burdens user with responsibility to be sure the link is made before she tries to use it C. Jones CHEP03
Event as Object Relational Database • No objects have pointers to objects outside ‘atomic’ storage boundaries • E.g., MC Particles can hold pointers to their children if store all MC Particles together • Objects in lists have unique identifier • Physicists use the identifier when talking with other physicists • In our system, use our own templated Table class to hold lists of objects which sort the objects via their identifier method • In our system, lists are identified via unique keys based on type of object in the list and two character strings • Relationships are defined via separate object: Lattice C. Jones CHEP03
Lattice • Links relationship data to the identifiers of two different objects (denoted by Left and Right) • Supports 16 different configurations • 1 or many Lefts per Link • 1 or many Rights per Link • 1 or many Links per Left • 1 or many Links per Right Left 1 Link A Right 1 Left 2 Link B Right 2 C. Jones CHEP03
Object Relational Approach Track:1 Hit:1 EM Shower:1 1:data:1 1:data:1,2 2:data:1 Hit:2 EM Shower:2 4:data:1 Hit:3 7:data:1 2:data:2 Hit:4 EM Shower:3 3:data:2 Hit:5 Track:2 5:data:2 Hit:6 6:data:2 Hit:7 8:data:2 Hit:8 C. Jones CHEP03
Improving Usability • Easier to use objects that directly link to related objects • Created ‘Navigation’ objects that give direct access to related objects • Internally look up relationship in appropriate Lattice • Related objects obtained using regular data access mechanism • I.e., Navigation objects just do what users would have to do • NOTE: To avoid interdependencies in crucial software, only analysis code is allowed to use Navigation objects • Taken special care so only if accessing an object via Navigation do you become compile/link-time dependent on it • E.g., if you do not use EM showers then you do not need to link them • Only one library is allowed interdependencies • Makes maintenance easier C. Jones CHEP03
Advantages • Shortens link times • Usually less than 30 seconds on a moderate machine • We use dynamic loading so only have to link to libraries your module directly needs • Simplifies storage code • Easier to support many specialized storage formats • Speed up data read-back • Only have to retrieve data user actually uses • E.g., can ask if a Track is matched to a EM Shower without having to construct the EM Showers • Can use multiple data sources on read-back • E.g., build event by combining physicist’s data skim and experiment’s event database C. Jones CHEP03
Conclusion • Compile/Link/Run-time dependencies make code less robust • Avoid unnecessary dependencies by encapsulating relationships in a separate object • Provide directly linked objects only to analysis users • Users productivity and satisfaction will increase • Shorter compile times • Shorter run times C. Jones CHEP03