360 likes | 377 Views
IWLSC – CHEP 2006. The Evolution of Databases in HEP A Time-Traveller's Tale Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US, CN-AS, CN-ASD, IT-ASD, IT-DB, IT-GD, ¿¿-??, …. Abstract. The past decade has been an era of sometimes tumultuous change in the area of Computing for High Energy Physics.
E N D
IWLSC – CHEP 2006 The Evolution of Databases in HEP A Time-Traveller's Tale Jamie Shiers, CERN ~ ~ ~ …, DD-CO, DD-US, CN-AS, CN-ASD, IT-ASD, IT-DB, IT-GD, ¿¿-??, …
Abstract • The past decade has been an era of sometimes tumultuous change in the area of Computing for High Energy Physics. • This talk addresses the evolution of databases in HEP, starting from the LEP era and the visions presented during the CHEP 92 panel "Databases for High Energy Physics" (D. Baden, B. Linder, R. Mount, J. Shiers). • It then reviews the rise and fall of Object Databases as a "one size fits all solution" in the mid to late 90's and finally summarises the more pragmatic approaches that are being taken in the final stages of preparation for LHC data taking. • The various successes and failures (depending on one's viewpoint) regarding database deployment during this period are discussed, culminating in the current status of database deployment for the Worldwide LCG.
Once open a time… Databases for HEP Panel at CHEP 1992: • Should we buy or build database systems for our calibration and book-keeping needs? • Will database technology advance sufficiently in the next 8 to 10 years to be able to provide byte-level access to petabytes of SSC/LHC data? Drew Baden, University of Maryland (PASS Project); B. Linder, Oracle Corporation; Richard Mount, Caltech; Jamie Shiers, CERN.
Buy or build? – Calibration & Bookkeeping • Is it technically possible to use a commercial system? • Would it be manageable administratively and financially? • Questions already investigated during LEP planning phase • Computer Physics Communications 45 (1987) 299-310 • “Possibility” in 1984 (technically but not Q2); • “Probability” in 1992 (technically and conceivably also Q2).
Calibration & Bookkeeping in 1992 • Many experiments had independently developed solutions to these problems, with largely overlapping functionality • Two (of the) efforts at producing common solutions: • FATMEN – file catalog and more (used by DELPHI, L3, OPAL etc. ); • HEPDB – calibration database based on DBL3 and OPCAL • Both of these were based on ZEBRA-RZ • ZEBRA-FZ for “updates”, with zftp/zserv for distribution • FATMEN also had an Oracle backend • Later dropped…
Arguments - 1992 • A significant amount of HEP-specific code would need to be added – roughly comparable in size (within a factor of 2) of existing home-grown solutions • Commercial systems require well trained support staff at every site. (Home-grown too, but experience shows that this is typically much less than for commercial solutions.) • Licensing and support for large, diverse HEP collaborations clearly a concern We shall revisit these questions later in the show…
¿Use of existing packages for SSC/LHC? • Could / should Zebra be used in 2001? • Hope to exploit language features of Fortran90 (?) and/or C++ • File catalog for multi-PB of data? • Move towards nameserverapproach (LFC = castor ns) • Existing data access and management tools? • Away from sequential access… • Where do we store the data? • Actually in a database; • Assumed for metadata; less likely for data itself • In some ‘super filesystem’ – e.g. based on IEEE MSS reference model
Zebra RZ and Scalability Issues • Some fields within RZ were limited to 16 bits • Fine for storage of HBOOK histograms etc but a limitation for larger catalogs • Move from 16-bit fields to 32-bit (Sunanda Banerjee) • A disconcertingly recurrent theme over the next decade… • (Also introduction of ‘eXchange’ format – makes zftp redundant in favour of binary ftp) (Burkhard Holl)
CHEP 92 – the Birth of OO in HEP? • Wide-ranging discussions on the future of s/w development in HEP • A number of proposals presented – including ZOO (I preferred “OOZE”) – leading to: • RD41 – MOOSE • The applicability of OO to offline particle physics code • RD44 – GEANT4 • Produce a global object-oriented analysis and design of an improved GEANT simulation toolkit for HEP • RD45 – A Persistent Object Manager for HEP • (and later also LHC++ (subsequently ANAPHE)) • ROOT
LHC++ • A re-write of CERNLIB in C++? • NO • A commercial-(only) solution? • NO • An attempt to provide the common core application libraries required by the LHC experiments (in C++) based on a review of their actual (rather than historical) requirements? • YES • But in the mid-90’s environment… • Strict manpower constraints…
LHC++ : continued • Designed to evolve with time – and (de-facto / - jure) standards • For example: • Tools.h++ ObjectSpace libraries (STL etc.) STL • Commercial OpenGL ‘supplied’ OpenGL • Look back at KERNLIB functionality – much now provided ‘with’ language • MATHLIB : early 90’s study showed NAG had best coverage of HEP needs • All MATHLIB functionality now covered – including some CERNLIB code! • Outstanding issues – certainly not dissimilar to those for DBs at CHEP 92 • Support, licensing, … for multitudinous, heterogeneous, distributed sites
RD45 – Initial Milestones • [The project ] should be approved for an initial period of one year. The following milestones should be reached by the end of the 1st year. • A requirements specification for the management of persistent objects typical of HEP data together with criteria for evaluating potential implementations. [ Later dropped – experiments far from ready ] • An evaluation of the suitability of ODMG's Object Definition Language for specifying an object model describing HEP event data. • Starting from such a model, the development of a prototype using commercial ODBMSes that conform to the ODMG standard. The functionality and performance of the ODBMSes should be evaluated. • It should be noted that the milestones concentrate on event data. Studies or prototypes based on other HEP data should not be excluded, especially if they are valuable to gain experience in the initial months.
RD45 – Initial Steps • Contacts with the main ODBMS vendors of that time • O2, ObjectStore, Objectivity, Versant, Poet, … • Many presentations scheduled at CERN • Training on O2, Objectivity/DB, a few licenses acquired… • Prototyping focussed on Objectivity/DB with Versant (later) as primary fall-back • Scalability of architecture
Objectivity/DB Scalability 16 bits again! 216 databases (files) of 100GB = 6.5PB CERN has requested extended OID
LCRB review, March 1996 • The RD45 project has made excellent progress in identifying and applying solutions for object persistence for HEP based on standards and commercial products • RD45 should be approved for a further year • The LCRB agrees with the program of future work outlined in the RD45 status report and regards the following activities (below) and milestones (next) as particularly important: • Provide the object persistence services needed for the first release of GEANT4 in early 1997 • Collaborate with ATLAS and CMS in the development of those aspects of the Computing Technical Proposals which may be affected by the nature of object persistence services
RD45 Milestones - 96 • Identify and analyse the impact of using an ODBMS for event data on the Object Model, the physical organisation of the data, coding guidelines and the use of third party class libraries • Investigate and report on ways that Objectivity/DB features for replication, schema evolution and object versions can be used to solve data management problems typical of the HEP environment • Make an evaluation of the effectiveness of an ODBMS and MSS as the query and access method for physics analysis. The evaluation should include performance comparisons with PAW and Ntuples
RD45 Milestones - 97 • Demonstrate, by the end of 1997, the proof of principle that an ODBMS can satisfy the key requirements of typical production scenarios (e.g. event simulation and reconstruction), for data volumes up to 1TB. The key requirements will be defined, in conjunction with the LHC experiments, as part of this work, • Demonstrate the feasibility of using an ODBMS + MSS for Central Data Recording, at data rates sufficient to support ATLAS and CMS test-beam activities during 1997 and NA45 during their 1998 run, • Investigate and report on the impact of using an ODBMS for event data on end-users, including issues related to private and semi-private schema and collections, in typical scenarios including simulation, (re-)reconstruction and analysis.
RD45 Milestones - 98 • Provide, together with the IT/PDP group, production data management services based on Objectivity/DB and HPSS with sufficient capacity to solve the requirements of ATLAS and CMS test beam and simulation needs, COMPASS and NA45 tests for their '99 data taking runs. • Develop and provide appropriate database administration tools, (meta-)data browsers and data import/export facilities, as required for (1). • Develop and provide production versions of the HepOODBMS class libraries, including reference and end-user guides. • Continue R&D, based on input and use cases from the LHC collaborations to produce results in time for the next versions of the collaborations' Computing Technical Proposals (end 1999).
Toward the 2001 Milestone“Choice of ODBMS vendor” “If the ODBMS industry flourishes it is very likely that by 2005 CMS will be able to obtain products, embodying thousands of man-years of work, that are well matched to its worldwide data management and access needs. The cost of such products to CMS will be equivalent to at most a few man-years. We believe that the ODBMS industry and the corresponding market are likely to flourish. However, if this is not the case, a decision will have to be made in approximately the year 2000 to devote some tens of man-years of effort to the development of a less satisfactory data management system for the LHC experiments.” (CMS Computing Technical Proposal, section 3.2, page 22) And the rest, as they say, is History…
Risk Analysis: Issues • Choice of Technology • ODBMS, ORDBMS, RDBMS, “light-weight” Persistency, files + meta-data, ... • Choice of Vendor (historically) • #1 Objectivity, #2 Versant • Size of market • Did not take off as anticipated; unlikely to grow significantly in short-medium term CERN - Computing Challenges
Risk Analysis and Actions • ODBMS market has not grown as predicted • Need to understand alternatives • Possibilities include: • “Open Source” (?) ODBMS solution • ORDBMS-based solution (also for event data) • “Hybrid solutions”, incl. Meta-data + files • RD45 investigating & directly • Based on experience at FNAL / BNL ... • Essential to consider all requirements • And not just file I/O… CERN - Computing Challenges
Espresso • Espresso is a proof-of-concept prototype built to answer questions from Risk Analysis • Could we build an alternative to Objectivity/DB? • How much manpower would be required? • Can we overcome limitations of Objy’s current architecture? • Support for VLDBs, multi-FD work-arounds etc. • Test / validate import architectural choices CERN - Computing Challenges
Espresso – Lessons Learnt • Initial prototype suggests that building a full ODBMS is technically feasible • Discussions with other sites suggest that interest goes well beyond HEP • Manpower estimates / possible resources indicate “project” would have to start “soon” • 2002: 3-year project with full system end-2004 CERN - Computing Challenges
ODBMS – In Retrospect • Used – in production – by several experiments at CERN, SLAC and other labs for a total of a few PB of data for just under a decade • It was – for some extended period – the baseline of ATLAS and CMS • Enhancements to the product obtained (with some effort) • MSS interface (actually much more general xrootd); • Linux port • VLDB support • (ODMG compliance) • Much experience with ODBMS-like solutions obtained • “Risk analysis” clearly identified need for fallback, proof-of-concept prototype (Espresso) and eventually full solution (POOL) • Migration to Oracle+DATE (COMPASS) / POOL (LHC expts) successfully reported on a CHEP 2004 (scale of several hundred TB)
The Story So Far… • 1992: CHEP – DB panel, CLHEP K/O, CVS … • 1994: start of OO projects • 1997: proposal of ODBMS+MSS; BaBar • 2001: CMS change of baseline Objy • Now: LCG Persistency Framework RTAG • Resulted in POOL project… CERN - Computing Challenges
15. Observations – IT “Eloise” Retreat 2000 • Large volume event data storage and retrieval is a complex problem that the particle physics community has had to face for decades. • The LHC data presents a particularly acute problem in the cataloguing and sparse retrieval domains, as the number of recorded events is very large and the signal to background ratios are very small. All currently proposed solutions involve the use of a database in one way or another. • A satisfactory solution has been developed over the last years based on a modular interface complying with the ODMG standard, including C++ binding, and the Objectivity/DB object database product. • The pure object database market has not had strong growth and the user and provider communities have expressed concerns. The “Espresso” software design and partial implementation, performed by the RD-45 collaboration, has provided an estimate of 15 person-years of qualified software engineers for development of an adequate solution using the same modular interface. This activity has completed, resulting in the recent snapshot release of the Espresso proof-of-concept prototype. No further development or support of this prototype is foreseen by DB group. • Major relational database vendors have announced support for Object-Relational databases, including C++ bindings. • Potentially this could fulfil the requirements for physics data persistency using a mainstream product from an established company. • CERN already runs a large Oracle relational database service CERN - Computing Challenges
Recommendation • The conclusion of the Espresso project, that a HEP-developed object database solution for the storage of event data would require more resources than available, should be announced to the user community. • The possibility of a joint project between Oracle and CERN should be explored to allow participation in the Oracle 9i beta test with the goals of evaluating this product as a potential fallback solution and providing timely feedback on physics-style requirements. Non-staff human resources should be identified such that there is no impact on current production services for Oracle and Objectivity. CERN - Computing Challenges Fellow, later also openlab resources
Oracle for Physics Data • Work on LHC Computing started ~1992 (some would say earlier…) • Numerous projects kicked off 1994/5 to look at handling multi-PB of data; move from Fortran to OO (C++) etc. • Led to production solutions from ~1997 • Always said that ‘disruptive technology’, like Web, would have to be taken into account • In 2002, major project started to move 350TB of data out of ODBMS solution; >100MB/s for 24 hour periods • Now ~2TB of physics data stored in Oracle on Linux servers • A few % of total data volume; expected to double in 2004 • [ I guess its 10 x this by now? ] Oracle 10g launch, ZH
LCG and Oracle • Current thinking is that bulk data will be streamed to [ ROOT ] files • RDBMS backend also being studied for ‘analysis data’ • File catalog (109 files) and file-level metadata will be stored in Oracle in a Grid-aware catalog • [ This was the “RLS” family ] • In longer term, event level metadata may also be stored in the database, leading to much larger data volumes • A few PB, assuming total data volume of 100-200PB • [ This was probably an over estimate – TAG : RAW ratio? ] • Current storage management system – CASTOR at CERN – also uses a [Oracle] database to manage the naming / location of files • Bulk data stored in tape silos and faulted in to huge disk caches Oracle 10g launch, ZH
Физические изыскания - перспективы • Реинжениринг всех сервисов СУБДдля физической науки на базе Oracle 10g RAC • Цели: • Изолирование – ‘сервисы’ 10g и / илифизическое разделение • Масштабируемость - как для вычислительной мощности для обработки БД, так и для устройств хранения • Надежность – автоматический обход сбоя в случае проблем • Управляемость – упрощение процессов администрирования • Вернемся к этому вопросу позже, в разделе ‘Enterprise Grids’ …
Physics Activities - Futures • Re-engineering all DB services for Physics on Oracle 10g RAC • Goals are: • Isolation – 10g ‘services’ and / or physical separation • Scalability - in both database processing power and storage • Reliability – automatic failover in case of problems • Manageability – significantly easier to administer than now • Will revisit this under ‘Enterprise Grids’ later… Oracle Grid Tech. day, Moscow
CERN & Oracle • Share a common vision regarding the future of high performance computing • Wide spread use of commodity dual processor PCs running Linux; • Focus on Grid computing • CERN has managed to influence Oracle product • Oracle 10g features: • Support for native IEEE floats & doubles; • Support for “Ultra large” Databases (ULDB); 16 bit fields again! • Cross-platform transportable tablespaces; • Instant-client developer etc. Oracle Grid Tech. day, Moscow
LHC DB Applications • Clear that many “LHC construction / operations applications will use Oracle • This is true also for detector construction / monitoring / calibration applications • But the main change is closer to “physics applications” • There was no “general purpose” DB service for the physics community at the time of LEP • Some applications certainly (OPAL online tape DB, …) • But these are legion at the time of LHC… • See hidden slides for some more info… • Moreover…
WLCG and Database Services • Many ‘middleware’ components require a database: • dCache – PostgreSQL (CNAF porting to Oracle?) • CASTOR / DPM / FTS* / LFC / VOMS – Oracle or MySQL • Some MySQL only: RB, R-GMA#, SFT# • Most of these fall into the ‘Critical’ or ‘High’ category at Tier0 • See definitions below; T0 = C/H, T1 = H/M, T2 = M/L • Implicit requirement for ‘high-ish service level’ • (to avoid using a phrase such as H/A…) • At this level, no current need beyond site-local+ services • Which may include RAC and / or DataGuard • [ TBD together with service provider ] • Expected at AA & VO levels *gLite 1.4 end October #Oracle version foreseen +R/O copies of LHCb FC?
Those Questions again… • Should we buy or build database systems for our calibration and book-keeping needs? • It now seems to be accepted that we buildour calibration & book-keeping systems on top of a database system. • Both commercial and open-source databases are supported. • Will database technology advance sufficiently in the next 8 to 10 years to be able to provide byte-level access to petabytes of SSC/LHC data? • We (HEP) have run production database services up to the PB level. The issues related to licensing, and – perhaps more importantly – support, to cover the full range of institutes participating in an LHC experiment, remain. • Risk analysis suggests a more cautious – and conservative – approach, such as that currently adopted. (Who are today the concrete alternatives to the market leader?)
If you want to know more… • Visit http://hepdb.blogspot.com/ • And also: • http://wwwasd.web.cern.ch/wwwasd/cernlib/rd45/ • http://wwwasd.web.cern.ch/wwwasd/lhc++/indexold.html • http://hep-proj-database.web.cern.ch/hep-proj-database/