160 likes | 318 Views
BaBar Tier A @ CC-IN2P3. Jean-Yves Nief CC-IN2P3, Lyon. HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002. Talk’s outline. Overview of BaBar: motivation for a TierA. Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network).
E N D
BaBar Tier A @ CC-IN2P3 Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002
Talk’s outline • Overview of BaBar: motivation for a TierA. • Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network). • Softwareissues (maintenance, data import). • Resources usage (CPU used…). • Problems encountered (hardware, software). • BaBar-Grid and future developments. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
BaBar: a short overview • Study of CP violation using B mesons, located at SLAC. • Since 1999, more than 88 millions B-B events collected. ~ 660 TB of data stored (real data + simulation) How is it handled ? • Object oriented techniques: C++ software and OO database system (Objectivity). • For data analysis @ SLAC: 445 batch workers (500 CPUs), 127 Objy servers + ~50 TB of disk + HPSS. But: important users needs (> 500 physicists)=>saturation of the system. collaborators spread world-wide (America, Europe). Idea: creation of mirror sites where data analysis/simu prod could be done. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
CC-IN2P3 Tier A: hardware (I) • 19 Objectivity servers: SUN machines. - 8 Sun Netra 1405T (4 CPUs). - 2 Sun 4500 (4 CPUs). - 1 Sun 1450 (4 CPUs). - 8 Sun 250 (2 CPUs). • 9 servers for data access for analysis jobs. • 2 databases catalog servers. • 6 servers for databases transactions handling. • 1 server for Monte-Carlo production. • 1 server for data import/export. • 20 TB of disks. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Hardware (II): Storage system • Mass storage system: 20 % available on disk => automatic stagingrequired. • Storage for private use: • Temporary storage: 200 GB NFS space. • Permanent storage: - For small files (log files…): Elliot archiving system. - For large files (ntuples…) > 20 GB: HPSS (2% of the total occupancy). > 100 TB in HPSS HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Hardware (III): the network • Massive data import from Slac ( ~ 80 TB in one year ). • Data needs to be available in Lyon within a short amount of time (max: 24 - 48 hours). • Large bandwidth between SLAC and IN2P3 required. • 2 roads: • CC-IN2P3 Renater US : 100 Mbs/s • CC-IN2P3 CERN US : 155 Mbs/s (until this week) • CC-IN2P3 Geant US : 1 Gbs/s (from now on) Full potential never reached (not understood) HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Hardware (IV): the batch and interactive farm • The batch farm (shared): • 20Sun Ultra 60 dual processor. • 96Linux PIII-750 MHz dual processor, NetFinity 4000R. • 96Linux PIII-1GHz dual processor, IBM X-series. • 424 CPUs • The interactive farm (shared): • 4 Sun machines. • 12 Linux machines. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Software (I): BaBar releases, Objectivity • BaBar releases: • Needs to keep up with evolution of the BaBar software at Slac. new BaBar software releases have to be installed as soon as they are available. • Objectivity and related issues: • Development of tools: • To monitor the servers activity, HPSS and batch resources. • To survey the Objectivity processes on the servers (« sick » daemons, transactions locks…). • Maintenance: software upgrades, load balancing of the servers. • Debugging the Objy problems both on client and server side. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Software (II): data import mechanism • SLACCern IN2P3 • (2) SLACRenaterIN2P3 Data catalog available for users through a mySql database. • < size of the dbs > ~ 500 MB • using multi-stream transfer • (bbftp: designed for big files). • extraction when new or updated dbs available. • import in Lyon launched when extraction @ • SLAC is finished. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Resources usage (I) Tier A officially opened last fall. • ~ 200 - 250analysis jobs running in parallel (the batch system can handle up to 600 jobs in // ). • ~ 60 – 70 MC production jobs running in //. already ~ 50 millions events produced in Lyon. now represents ~ 10-15% of the total weekly BaBar MC prod. • ~ 1/3 of the jobs running are BaBar jobs. • Up to 4500 jobs in queue during the busiest periods. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Resource usage (II) (*) (*) 1 unit = 1/8 hour on PIII, 1 GHz. • BaBar: top CPU consumer group in the last 4 months at IN2P3. • Second CPU consumer since the beginning of the year. • MC prod represents 25 – 30% of the total CPU time used. ~ 25 – 30% of CPU for analysis used by remote users. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Resources usage (III) • 20% of the data on disk dynamic staging via HPSS (RFIO interface). • ~ 80 s for a staging request. • Up to 3000 staging requests possible per day • Not a limitation for CPU efficiency. • Needs less disk space, allow to save money. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Problems encountered • A few problems with the availability of data in Lyon due to the complexity of the export/import procedure. • Network bandwidth for data import a bit erratic, maximum never reached. • Objectivity related bugs (most of them due to Objy server problems). • Some HPSS outages, system overloaded (software related + hardware limitations): solved better performance now. • During peak activity (e.g. before the summer conference), huge backlog on the batch system. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
The Tier A and the outer world: BaBar Grid @ IN2P3 • Involvement of BaBar to use Grid technologies. • Storage Resource Broker (SRB) and MetaCatalog (MCAT) software installed and tested @ IN2P3: • Allows to access data sets and resources based on their attributes rather than their physical locations. Future for the data distribution between SLAC and IN2P3. • Tests @ IN2P3 of the EDG software using BaBar analysis applications: possible to remotely submit a job @ IN2P3 to RAL and SLAC. Prototype of a tool to remotely submit jobs: December 2002. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
CC-IN2P3 Tier A: future developments • 2 new Objy servers + new disks (near future): • 1 allocated to MC prod goal: x 2 the MC production. • Less staging requests to HPSS. • 72 new Linux batch workers ( PIII, 1.4 Ghz) CPU power increased by 50% (shared with others). • Compression of the databases on disk (client or server decompression on the fly) HPSS load decreased. • Installation of a dynamic load balancing system on the Objy servers more efficient (next year). HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
Conclusion • BaBar Tier A in Lyon running full steam. • ~ 25 – 30 % of the CPU consumed by analysis jobs used by remote users. • Significant resources at CC-IN2P3 dedicated to BaBar (CPU: 2nd biggest user this year, HPSS: first staging requester). • Contribution to BaBar overall effort increasing thanks to: • New Objy servers and disk space. • New batch workers (72 new Linux this year, ~ 200 next year). • HPSS new tape drivers. • Database compression and dynamic load balancing of the servers. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002