270 likes | 378 Views
B A B AR Operational Experience with Objectivity ODBMS. David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Experiment DRQuarrie@LBL.GOV. Database Goals. Provide storage and access for event data Event store Provide storage and access for detector conditions data
E N D
BABAR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory for BABAR Experiment DRQuarrie@LBL.GOV
Database Goals • Provide storage and access for event data • Event store • Provide storage and access for detector conditions data • Environmental conditions that vary with time • Conditions & Ambient databases • Configuration Management • Keyed access to unique configurations • Trigger • Detector setpoints • Configuration Database • Not production management • Handle distribution and access across whole collaboration • Wide area as well as local area access David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Experiment Characteristics David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Performance Requirements • Online Prompt Reconstruction • Baseline of 200 processing nodes • 100 Hz total (physics plus backgrounds) • 30 Hz of Hadronic Physics • Fully reconstructed • 70 Hz of backgrounds, calibration physics • Not necessarily fully reconstructed • Physics Analysis • DST Creation • 2 users at 109 events in 106 secs (1 month) • DST Analysis • 20 users at 108 events in 106 secs • Interactive Analysis • 100 users at 100events/secs David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Functionality Summary • Basic design/functionality ok • No performance or scaling problems with conditions, ambient and configuration databases • Security and data protection APIs added • Internal to a federation • Access to different federations • Problems • Significant performance/scaling problems with event store • Online Prompt Reconstruction • Physics Analysis • Data Distribution problems • Internal within SLAC • External to/from remote Institutions • Focus of the remainder of the talk David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Computing Review 2-4 Aug 1999 • Identified database performance as major technical concern • Recommended database reviews in Feb and Aug 2000 • Recommended development of limited-function short-term non-Objy solution for micro-DST analysis • Recommended setting up of a dedicated Objectivity testbed in order to perform detailed scaling and performance tests David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Production Federations • Two groups • Physics • Online • Analysis • Reprocessing • Simulation • Generation • Analysis • Reprocessing • Motivations • Minimization of interference (particularly with online) • Increase the available number of databases • Operational experience caused the Online to be split • IR2 • OPR David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
SLAC Design Hardware Configuration David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
SLAC Configuration at time of Review X X X X X X David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Testbed Hardware Configuration • Testbed hardware available from about 7th August • Two datamovers (450) • 100+ bronco clients (Ultra-5) • Conditions & catalog servers (250) • Journal servers (250) • Lock servers • Two sets of tests • Online Prompt Reconstruction (OPR) • Physics Analysis • Initial tests have focussed on OPR • Already well instrumented • Expect any performance improvements to apply to analysis as well • Dedicated analysis performance tests later David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Baseline Configuration • We baselined the testbed against the production system to ensure that we started off with the same performance • Turned off filtering • All input events are being fully reconstructed • Easier to understand event rate • Will turn it back on again later on in the testing • Some of tests are preliminary and we need to go back & redo them • Don’t fully understand all the numbers yet • The tests are still underway • Numbers are not final David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Baseline Results at time of Review Asymptotic limit Production set point David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Minimize catalog operations Separate conditions DB server Separate catalog server Tune AMS server Client file descriptors Client cache sizes Initial container sizes Transaction lengths TCP configuration Multiple AMS processes Database clustering Autonomous partitions Disable filters Singleton Federations Veritas Filesystem optimization Decrease payload per event LM starvation? Loadbalance across datamovers More datamovers Database pre-creation Gigabit lockserver Caching handles Local bootfile Unlock instead of mini-transaction Run OPR with no output Run on shire to bypass AMS Knobs to twiddle (tests so far) David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Results so far 4 datamovers David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Significant Items • Minimize Catalog operations • e.g. Named containers • Linkable AMS server slow (~3-4 Mbytes/sec) • Not the normal AMS - the special one allowing migration/staging • Inefficiency in handling 16k file descriptors • Located in Objy code • First improvement by Andy Hanushevsky • Probably more improvements to come • Extending containers is expensive • During persistent object creation • Contrary to advice from Objy engineer • For a single process it’s low overhead • Causes locking • Presize to 50% of average final size David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Significant Items (2) • Database clusters • Grouping of nodes to databases • Reduce the number of processes accessing each database • Undocumented locking operation to extend containers • Multiple AMS processes per server • Currently single threaded • Definite improvement with 4 – we’ll try 8 • N.B. Most servers have 4 cpus • Won’t be necessary in 5.2 - the AMS is (finally) multi-threaded • Veritas filesystem configuration • Single-threaded tests show 40MB/sec read & write • Random-write tests (non-Objy) show 7MB/sec throughput • We’re seeing about this with 180 nodes • Work in progress on optimization • Managed 8 MB/sec • More datamovers David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Problem - Payload per event - Problem is our poor implementation, not Objectivity overhead - Work is underway to redesign/reimplement this David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Future Prompt Reconstruction Tests • Reduce payload per event • Autonomous partitions • Slight hint of lock server saturation (cpu load) • Veritas filesystem optimization • TCP configuration • About to try 250 nodes • The bottom line: • We’ve met the design goals (with filtering re-enabled) • Still lots of possibilities for improvements David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Physics Analysis • No quantitative tests yet • Expect that improvements shown by prompt reconstruction will also improve performance for physics analysis • Also expect to find and apply read-only optimizations • 3 “typical” jobs being setup • CPU bound • Medium cpu “skim” • Fast physics analysis • Testing about to start • Also using shire (E10000) as database server • Objy 5.2 (with SLAC extensions) will support dynamic load-balancing across multiple servers • 20MB/sec per server? David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Data Distribution Issues • Internal to SLAC • Sweeps of data between production federations • Database id allocation scheme works well • HPSS catalog used as primary location • Shadowing of databases as well as copying • Bookkeeping is biggest outstanding problem • Getting better but a ways to go… • External to SLAC • Use of 10GB databases has caused major problems • Lots of unexpected infrastructure problems (perl, tsch, etc.) • Bugs in size calculation has caused some nominally 2 GB databases to exceed this limit • Fix being installed into production now • Bandwidth of tools • File copies between computers at SLAC David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Scaling Problems • Total number of database files • Being addressed by longRefs in future release • Avoids the current need for database files >2GB • Cause significant infrastructure problems • Timescale “6-9 months” • Number of nodes for parallel loading • We’re essentially there • In process of applying lessons from testbed to production • Administration tools operate slower • Still an issue • Update “starvation” • Administration problem since multiple read accesses prevent updates from being applied • MROW access expected to solve this David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Reliability Problems • Lock collisions • Better understanding of lock management • Avoid leaving lock trails behind • Automatic cleanup at end of job • Automatic cleanup at regular intervals • Separate Online and OPR federations • Separated for reliability & OPR lock “firestorms” • Unable to provide full calibration feedback • Firestorms not in fact an interference between Online and OPR • Solved by Objy bug fix and lock optimization • New design allows closed loop calibration feedback with separate federations • We’re gaining operational experience in production • Earlier tests (e.g. MDC2) didn’t scale David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Lack of automation problems • Goal was to achieve understanding and hence reliability using manual procedures, then install automatic procedures • Automatic procedures only work once we understand the issues and achieve reliable operation • Most of underlying tools now in place • e.g. Sweeping of data from one federation to another • Still lack necessary bookkeeping • Automatic procedures and logging mechanisms (e.g. web pages) slowly being put into place • More personnel now available to work on this • Still a lot of learning to be done in this area David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Risk Analysis - Alternatives to Objectivity? • Should we be looking into an alternative? • We have attempted to minimize direct dependency on Objectivity • Successful for reconstruction/analysis code • Not successful for infrastructure • Makefiles • Administration tools • Data distribution • MicroDST based on ROOT I/O • Takes advantage of Converters & Modules classes for Objy. David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Objectivity Usage Statistics • >30 sites using Objectivity • USA, UK, France, Italy, Germany • ~650 licensees • People who have signed the license agreement • ~400 users • People who have created a test federation • >100 simultaneous users • Monitoring distributed oolockmon statistics • 60 developers • Have created or modified a persistent class • A wide range of expertise • 10-15 experts • 485 persistent classes David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS
Conclusions • Basic design and technology ok • Serious performance/scaling problems at startup • Lots of learning about how to manage production environment • Dedicated testbed has demonstrated good results • Prompt Reconstruction now achieving design performance • Similar improvements in physics analysis expected • Not all these improvements have been fed back into production environments • Underway now • Is Objectivity suitable for use within HEP? • Yes • Is it the only solution? • No David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS