Status of the BaBar Databases

Status of the BaBar Databases Jacek Becla BaBar Database Group

BaBar Is in Production • Run 1: May 1999 – Oct 2000 • ~24.2 fb-1 (~1.3 per month) • Run 2: Feb 2001 – July 2002 • up to 12.6 fb-1 now (~2.5 per month) • Expected ~100 fb-1 by July 2002 • already well over designed luminosity

Prognosis

Changes • 4 -> 21 streams • >5 times more files, locks • no data duplication (streams not self-contained) • Smaller files • 2 -> 0.5, 10 -> 2 [GB] • Using Objy 6.1, read only dbs • Clustering hint server and cond OID server • Migrating production to Linux (now) • Introducing multi-fds (now) • Cannot afford a large test-bed anymore

OPR • In general keeps up with data • ~150 pb-1 per day • faster than at the end of Run 1 • in spite of 5x load • will have to deal with 300 pb-1 soon

Current OPR Configuration • Hardware • 6 4-CPU data servers, lock server, jnl server, catalog server, clustering hint server + conditions OID server • 220 clients • Software • Objy 6.1, Solaris 7 • about to migrate to Linux

OPR – Short Term Future • Use multi-fds • 2 event store fds, 1 conditions • 6 + 6 data servers • new federation approx. every week • Migrate clients to Linux • 2.2 faster CPU, more memory • Use faster machine for lock servers • now: Sun Netra T1, 440 MHz • planned: Sun Blade 1000, 750 MHz UltraSPARC-3 • Discussions about storing all digis in objy, and reprocessing from Objy, not xtc

REPRO • Hardware configuration similar to OPR • Occasionally up to 3 repro farms • over 300 pb-1 on a good day • 150+150+200 nodes • condition merging nightmare

REPRO – Near Future • Use multi-fds • 2 event store fds, 1 conditions • 5 + 5 data servers • new federation ~ every other week • same slow lock servers • Move to Linux • Run in Italy. Timescale ~mid 2002

Robustness • Db creation (weak point) removed • precreation in background by CHS, automatic recovery, new C++ api in 6.0 • AMS crash • ¾ of the farm continues, unless it is a “default” AMS(used by CHS) • CHS – new central point of failure • entirely in our hands, very stabile so far • One event store fd down (e.g. lock server crash) • the second should finish processing current run • Cleanup server – worked on

Analysis • 200 CPUs (~Sun Netra T1 like) • 17 servers, 24 TB disk cache • On demand staging turned off • Read only dbs • starting to see effect now • Disk space – always a problem • micro – 5.4 KB/event (aod, col, tag, evt, evshdr) • mini – 4.7 KB/event (esd)

Analysis – cont… • Veritas File System reconfiguration • direct I/O instead of buffered I/O • more than doubles effective data rate • Lock server memory leak • grows up to 600 MB in a week • switching every week • Kanga (ROOT based) will become deprecated • recent computing model: enhance Objy, deprecate kanga (freeze by Mid 2002, produce files till late 2002)

AMS • Known (but not fixed) problem • file used immediately after being closed • crashes AMS (in 6.1 kills the client) • Ported to Linux • no performance figures yet • New feature - compression • Redesigning front end part • got ok from Objy

A Word on Conditions • Using OID server to find time interval • only in REPRO so far, about to put in OPR • Staircase problem • incorrect design • purging every 2 weeks, ~15 min per rolling calibration (35 in total), run in parallel • Finalize problem • based on genealogy object, (all objects named), result of iteration in unpredicted order. Just slow • Condition merging problem

Conditions…cont • Index problem • occasionally index inconsistent (does not return all objects in given range). Solution – rebuild. Happens ~once every 2 months. Not reported yet. • Index scaling • range query (the way we use it) does not scale • response time linear (100 K entries -> 0.5 sec) • Will extend OID server • now read only access • Will redesign & re-implement conditions • and address all the problems, timescale: end of 01

Data Distribution • Micro-level data mirrored @ in2p3 • Run2 – mirror raw as well • Current tools do not scale with increased data volume • a lot of manual work • Will try using data grid based tools soon

Operations • 2 DBAs +3rd coming soon • Many manual tasks slowly being automated

Some Numbers • Total size of data – 300+ TB • # files – 128K • # users in analysis ~220 • 10 active production federations • this includes 5 analysis fds • Cond dbs – 12 GB

TuningPerformanceScalability

4 streams: 100 nodes: ~ 60 Hz200 nodes: ~115 Hz 160 nodes run, 20 streams, with duplication 8 Hz 420 Streams Was Non-trivial

Clustering Hint Server • CORBA based, multithreaded • Precreates in background dbs and conts, distributes oid to clients • Many other features: • containers reused • full integration with HPSS (precreated files pinned in cache, full dbs immediately migrated) • file disparsification • file transfer to tape: 1MB -> 15-25MB now • db creation locally, pre-sizing • no container extensions on the client side • round robin load balancing • automatic recovery, and so on

Others • commitAndHold • significant reduction in lock traffic • Initial transaction for condition • one instead of 50 transactions • Cache authorization • rather then check on every event • Tune # client file descriptor limit • Hit 8K limit on AMS site. Reduced client fd limit: 196 -> 32. AMS response improved, AMS CPU usage decreased • Increase trans granularity

Bottlenecks • Lock server • 1st signs of saturation: with ~ 200 nodes • use faster CPU • use Objy 7 (33% lock traffic reduction) • scheduled for October 2001 • more event store fds per farm • CPU on data servers • buy more – expensive • improve AMS, reduce event size

Use Faster CPU…

Miscellaneous • 64 K pages? • unfortunately not working with multi-fds • Maybe precreate/purge dbs only in between runs? • David is stepping down as a head of the BaBar DB group

Future Looks Bright • Lock server bottleneck • multi-fds – can always add one more event store fd • Objy 7 will feature faster lock server • CPUs are getting faster • Data server CPU saturation • AMS redesign should help • size of event (rec) being reduced now by ~10%, looking for more • can always buy more servers

Summary • No serious problems • conditions need to be redesigned • Likely OPR will keep up • Working in the BaBar DB group is fun!

Status of the BaBar Databases