690 likes | 851 Views
Object Databases as Data Stores for HEP. Dirk Düllmann CERN, IT/ASD & RD45. Outline Part I. Intro HEP Data Models Data Management for LHC Object Database Features What makes ODBMS suitable for HEP Data Models ? Objectivity/DB Features Scalability Data Management and Distribution
E N D
Object Databases as Data Stores for HEP Dirk Düllmann CERN, IT/ASD & RD45 INFN School on Modern OO s/w Applications and Methodologies
Outline Part I • Intro • HEP Data Models • Data Management for LHC • Object Database Features • What makes ODBMS suitable for HEP Data Models ? • Objectivity/DB Features • Scalability • Data Management and Distribution • Alternative products and approaches • HEP Projects using Objectivity/DB • Prototypes and production systems INFN School on Modern OO s/w Applications and Methodologies
ALICE • Heavy ion experiment at LHC • Studying ultra-relativistic nuclear collisions • Relatively short running period • 1 month/year = 1PB/year • Extremely high data rates • 1.5GB/s INFN School on Modern OO s/w Applications and Methodologies
ATLAS • General-purpose LHC experiment • High Data rates: • 100MB/second • High Data volume • 1PB/year • Test beam projects using Objectivity/DB in preparation: • Calibration database • Expect 600GB raw and analysis data INFN School on Modern OO s/w Applications and Methodologies
CMS • General-purpose LHC experiment • Data rates of 100MB/second • Data volume of 1PB/year • Two test beams projects based on Objectivity successfully completed. • Database used in the complete chain:Test beam DAQ, Re-construction and Analysis INFN School on Modern OO s/w Applications and Methodologies
LHCb • Dedicated experiment looking for CP-violation in the B-meson system. • Lower data rates than other LHC experiments. • Total data volume around 400TB/year. INFN School on Modern OO s/w Applications and Methodologies
Event Tracker Calor. TrackList HitList Track Hit Track Hit Track Hit Track Hit Track Hit HEP Data Models • HEP data models are complex! • Typically hundreds of structure types (classes) • Many relations between them • Different access patterns • LHC experiments rely on OO technology • OO applications deal with networks of objects • Pointers (or references) are used to describe relations INFN School on Modern OO s/w Applications and Methodologies
Data Management at LHC • LHC experiments will store huge data amounts • 1 PB of data per experiment and year • 100 PB over the whole lifetime • Distributed, heterogeneous environment • Some 100 institutes distributed world-wide • (Nearly) any available hardware platform • Data at regional-centers? • Existing solutions do not scale • Solution suggested by RD45: ODBMS coupled to a Mass Storage System • e.g. HPSS INFN School on Modern OO s/w Applications and Methodologies
Object Database Features INFN School on Modern OO s/w Applications and Methodologies
Object Persistency • Persistency • Objects retain their state between two program contexts • Storage entity is a complete object • State of all data members • Object class • OO Language Support • Abstraction • Inheritance • Polymorphism • Parameterised Types (Templates) INFN School on Modern OO s/w Applications and Methodologies
OO Language Binding • User has to deal with copying between program and I/O representations of the same data • User has to traverse the in-memory structure • User has to write and maintain specialised code for I/O of each new class/structure type • Tight Language Binding • ODBMS allow to use persistent objects directly as variables of the OO language • C++, Java and Smalltalk (heterogeneity) • I/O on demand • No explicit store & retrieve calls INFN School on Modern OO s/w Applications and Methodologies
Database# Cont.# Page# Slot# Navigational Access • Unique Object Identifier (OID) per object • Direct access to any object in the distributed store • Natural extension of the pointer concept • OIDs allow to implement networks of persistent objects (“associations”) • Cardinality: 1:1 , 1:n, n:m • uni- or bi-directional (referential integrity!) • OIDs are used via so-called “smart-pointers” INFN School on Modern OO s/w Applications and Methodologies
How do “smart pointers” work? • d_Ref<Track> is a smart pointer to a Track • The database automatically locates objects as they are accessed and reads them. • User does not need to know about physical locations. • No host or file names in the code • Allows de-coupling of logical and physical model INFN School on Modern OO s/w Applications and Methodologies
Event Tracker Calor. TrackList HitList Track Hit Track Hit Track Hit Track Hit Track Hit A Code Example Collection<Event> events; // an event collection Collection<Event>::iterator evt; // a collection iterator // loop over all events in the input collection for(evt = events.begin(); evt != events.end(); evt++) { // access the first track in the tracklist d_Ref<Track> aTrack; aTrack = evt->tracker->trackList[0]; // print the charge of all its hits for (int i = 0; i < aTrack->hits.size(); i++) cout << aTrack->hits[i]->charge << endl; } INFN School on Modern OO s/w Applications and Methodologies
Object Clustering • Goal: Transfer only “useful” data • from disk server to client • from tape to disk • Physically cluster objects according to main access patterns • Clustering by type • e.g. Track objects are “always” accessed with their hits • Main access patterns may change over time • Performance may profit from re-clustering • Clustering of individual objects • e.g. All Higgs events should reside in one file INFN School on Modern OO s/w Applications and Methodologies
Physical Model and Logical Model • Physical model may be changed to optimise performance • Existing applications continue to work INFN School on Modern OO s/w Applications and Methodologies
Concurrent Access • Support for multiple concurrent writers • e.g. Multiple parallel data streams • e.g. Filter or reconstruction farms • e.g. Distributed simulation • Data changes are part of a Transaction • ACID: Atomic, Consistent, Isolated, Durable • Access is co-ordinated by a lock server • MROW: Multiple Reader, One Writer per container (Objectivity/DB) INFN School on Modern OO s/w Applications and Methodologies
Objectivity Specific Features INFN School on Modern OO s/w Applications and Methodologies
Federation Database Container Page Object Objectivity/DB Architecture • Architectural Limitations: OID size 8 bytes • 64K databases • 32K containers per database • 64K logical pages per container • 4GB containers for 64kB page size • 0.5GB containers for 8kB page size • 64K object slots per page • Theoretical limit: 10 000PB • assuming database files of 128TB • RD45 model assumes 6.5PB • assuming database files of 100GB • extension or re-mapping of OID have been requested INFN School on Modern OO s/w Applications and Methodologies
Scalability Tests • Federated Databases of 500GB have been demonstrated • Multiple federations of 20-80GB are used in production • 32 filter nodes writing in parallel into one federated database • 200 parallel readers (Caltech Exemplar) • Objectivity/DB shows expected scalability • Overflow conditions on architectural limits are handledgracefully • Only minor problems found and reported back • 2GB file limit; fixed and tested up to 25GB • Federations of hundreds of TB are possible with the current version INFN School on Modern OO s/w Applications and Methodologies
Application Host Application & Disk Server Application Application Objy Client Objy Client Objy Server ObjyLock Server Objy Server Objy Server HPSS Server HPSS Client Data Server connected to HPSS Disk Server A Distributed Federation INFN School on Modern OO s/w Applications and Methodologies
Site 1 Site 2 Site 3 Data Replication • Objects in a replicated DB exists in all replicas • Multiple physical copies of the same object • Copies are kept in sync by the database • Enhance performance • Clients access a local copy of the data • Enhance availability • Disconnected sites may continue to work on a local replica Wide Area Network INFN School on Modern OO s/w Applications and Methodologies
Schema Evolution • Evolve the object model over the experiment lifetime • migrate existing data after schema changes • minimise impact on existing applications • Supported operations • add, move or remove attributes within classes • change inheritance hierarchy • Migration of existing Objects • immediate: all objects are converted using an upgrade application • lazy: objects are upgraded as they are accessed INFN School on Modern OO s/w Applications and Methodologies
Object Versioning • Maintain multiple versions of a single “logical” object • Example: Versions of calibration data in the BaBar calibration DB package
Other O(R)DBMS Products • Versant • Unix and Windows platforms • Scalable, distributed architecture • Independent databases • Currently most suitable fall back product • O2 • Unix and Windows platforms • Incomplete heterogeneity support • Recently bought by Unidata (RDBMS vendor) and merged with VMARK (data warehousing) INFN School on Modern OO s/w Applications and Methodologies
Other O(R)DBMS Products II • Objectstore - Object Design Inc. • Unix and Windows platforms • Scalability problems • Proprietary compiler, kernel driver • ODI re-focussed on web applications • POET • Windows platform • Low end, scalability problems • What will the big Object Relational Vendors provide? INFN School on Modern OO s/w Applications and Methodologies
Object Relational Databases • Present data either as object or as table • Table rows translate into objects • New data type “Row Id” allows navigation • Abstract data types (ADTs) • C++ or Java based DB API • Some ODBMS features still missing • Inheritance, Polymorphism, Templates • Clustering of individual objects INFN School on Modern OO s/w Applications and Methodologies
ODBMS Architectures • ODBMS are less server centric than RDBMS • ODBMS client cache keeps frequently used objects close to the CPU • ODBMS client implements a large part of the functionality • Improved scalability • Main design parameters • Granularity of data exchange • Granularity of locking • Implementation of OID (logical vs. physical) INFN School on Modern OO s/w Applications and Methodologies
Object Server & Object Locking • Versant DB • Object exchange between client and server • Only requested data is send • Every object needs an network transfer • Server “knows” about objects • Object level queries may be run on the server • Fat server, thin client • Single object locking • Simpler model for application programmer • Degrades scalability and performance INFN School on Modern OO s/w Applications and Methodologies
Page Server & Container Locking • Objectivity/DB • Page exchange between client and server • Page does contain not only requested data • In case of good clustering, it contains other objects that will be requested soon • Server only “knows” about I/O pages • Thin server, fat client • Improved scalability • Locking on container level • All objects in one container are locked at once • Improved scalability and performance INFN School on Modern OO s/w Applications and Methodologies
The ODMG Standard • Standardise on • Data definition language ODL • Data interchange format OIF • Language binding for C++, Java and SmallTalk • Goal: Simplify code migration from one database vendor to another • Most frequently used API elements are covered by the ODMG 2.0 standard. • Performance and the feature set of different ODBMS products differs significantly ! • Any large scale migration will still require a significant effort! INFN School on Modern OO s/w Applications and Methodologies
HEP Projects based on Objectivity/DB INFN School on Modern OO s/w Applications and Methodologies
Production - BaBar • BaBar at SLAC, due to start taking data in 1999 • Objectivity/DB is used to store event, simulation, calibration and analysis data • Expected amount 200TB/year, majority of storage managed by HPSS • Mock Data Challenge 2: • Production of 3-4 Million events in August/September • Partly distributed to remote institutes • Cosmic runs starting in October INFN School on Modern OO s/w Applications and Methodologies
Production - ZEUS • ZEUS is a large detector at the DESY electron-proton collider HERA • Since 1992 study of interactions between electrons and protons • Analysis environment: mainly FORTRAN code based on ADAMO • Objectivity/DB is used for event selection in the analysis phase • Store ~20GB of “tag data” - plan to extend to 200GB • Reported a significant gain in performance and flexibility compared to the old system INFN School on Modern OO s/w Applications and Methodologies
Production - AMS • The Alpha Magnetic Spectrometer already took data on a NASA space shuttle flight this year and will be used later on the International Space Station. • Search for antimatter and dark matter • Data amount 100GB • Objectivity/DB is used to store production data, slow control parameters and NASA auxiliary data INFN School on Modern OO s/w Applications and Methodologies
Production - CERES/NA45 • Heavy ion experiment at the SPS • Study of e+e- pairs in relativistic nuclear collisions • Successful use of Objectivity/DB from a reconstruction farm (32 Meiko CS2 nodes) • Expect to write 30 TB of raw data during 30 days of data taking • Reconstructed and filtered data will be stored using the Objectivity production service. INFN School on Modern OO s/w Applications and Methodologies
Production - CHORUS • Searching for neutrino oscillations • Using Objectivity/DB for an online emulsion scanning database. • Plans to deploy this application at outside sites. • Also studying Objectivity/DB for TOSCA - a proposed follow-on experiment. INFN School on Modern OO s/w Applications and Methodologies
COMPASS • COMPASS expects to begin full data taking in 2000 with a preliminary run in 1999. • Some 300TB of raw data will be acquired per year at rates up to 35MB/second. • Analysis data is expected to be stored on disk, requiring some 3-20TB of disk space. • Some 50 concurrent users and many passes through the data are expected. • Rely on the Objectivity production service at CERN INFN School on Modern OO s/w Applications and Methodologies
Summary • Objectivity/DB based data stores provide • a single logical view of complex object models • integration with multiple OO languages • support for physical clusteringof data • scaling up to PB distributed data stores • seamless integration with MSS like HPSS • Adopted by a large number of HEP experiments • even FORTRAN based experiments evaluate Objectivity/DB for analysis and data conservation • Entering production phase at CERN now • Objectivity service has been set-up for ATLAS, CMS, COMPASS and NA45 INFN School on Modern OO s/w Applications and Methodologies
Tutorial Example 1 • Objective: Populate a database with persistent Events • Define all involved classes • Simple object model consisting of :Event, Tracker, Track, Calo, Cluster • Create federated db, databases and containers • tracking and calo data are kept in separate databases (files) • Create events • Events contain randomly generated tracks and clusters INFN School on Modern OO s/w Applications and Methodologies
Defining a persistent class • Define a C++ class in a .ddl file • very similar to a normal C++ header file • some restrictions apply (see next slides) • some additional features are available • Inherit from the persistent base class class Event : public d_Object {public: int eventNr; ... }; • Introduce the new class to the database schema • Run to Objectivity Schema Processor ooddlx INFN School on Modern OO s/w Applications and Methodologies
DDL Restrictions • Persistent classes may not: • contain other persistent classes as data members • They may contain references to other persistent class though • Late (multiple-) inheritance from d_Object helps to keep transient and persistent classes in sync • contain C++ pointers or references • Neither directly nor through embedded classes • replace C++ pointers by database smart pointers • is the more intrusive change • Type declarationsof pointers referencing persistent objects have to be changed for all clients of a persistent capable class. • Code that uses these variables stays untouched. INFN School on Modern OO s/w Applications and Methodologies
DDL - Additional Features • Persistent classes may use in addition: • variable length arrays as data members • Example: A Tracker object contains a variable number of Track objectsooVarray(Track) trackList; • bi-directional associations • Example: Each Event has one Tracker, each Tracker belongs to one Event. ooHandle(Tracker) itsTracker <-> itsEvent; • 1-to-N or N-to-M associations • Example: One Run object keeps links to all its “N” events:ooHandle(Event) itsEvents[]; INFN School on Modern OO s/w Applications and Methodologies
Schema Handling • Definitions of persistent capable classes made in .DDL files • ooddlx processor generates appropriate headers & source code • Schema is added to federated database • Applications are built using generated files and the Objectivity library INFN School on Modern OO s/w Applications and Methodologies
HepODBMS Layer • Goal: • Independence from vendor and/or release changes • Naming indirection of most prominent API classes • Provide missing features of the ODMG standard • HEP specific high level classes • Session set-up and diagnostic • Transaction control • Clustering Hint classes • Very large collections (> 109 Objects) • Hierarchical Object Naming INFN School on Modern OO s/w Applications and Methodologies
Database Session Control • HepDbApplication - encapsulates db session control • Start/Commit/Abort Transactions • Set Lock handling options, lock wait time, number of retries • High level interface that allows • open/create/find FDBs, DBs and containers • Provide job or transaction level diagnostics for • cache efficiency • disk I/Os • object accesses and updates • container and object extension • steered by API and/or environment variables • heavily based on the ooSession class from Objectivity • small changes for Solaris, NT and transaction abort INFN School on Modern OO s/w Applications and Methodologies
Setting up a DB session using the HepDbApplication class main(){HepDbApplication dbApp; // create an appl. objectdbApp.init(“MyFD”); // init FD connection dbApp.startUpdate(); // update mode transactiondbApp.db(“analysis”); // switch to db “analysis” // create a new container ContRef histCont = dbApp.container(“histos”); // create a histogram in this containerHepRef(Histo1D) h = new(histCont) Histo1D(10,0,5); dbApp.commit(); // Commit all changes} INFN School on Modern OO s/w Applications and Methodologies
Object Clustering • The “new” operator provided by Objectivity allows to specify a clustering hint • may be a db, container or object reference • in which db, which container or close to which other object should the new object go • HepODBMS contains classes to encapsulate the clustering strategy in “Clustering Hint” objects • clustering into single physical containers (< .5 GB for 8kB pages) • clustering into logical containers (infinite size, spread over several db files) • parallel writing without lock contention • parallel load balanced reading • definition of class based clustering through persistent objects INFN School on Modern OO s/w Applications and Methodologies
Clustering by Class // class definition in Track.ddlclass Track : public d_Object { d_Double phi; d_Double theta; d_ULong noOfHits;// more stuffpublic:static HepContainerHint clustering;};[…] // define clustering at startupTrack::clustering = dbApp.container(“tracks”);[…] // use the clustering defined for tracksHepRef(Track) aTrack = new (Track::clustering()) Track; INFN School on Modern OO s/w Applications and Methodologies
ODBMS Stores on a Larger Scale INFN School on Modern OO s/w Applications and Methodologies