1 / 16

BaBar Storage at Lyon

BaBar Storage at Lyon. HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999. Rolf Rumler, John O’Neall, Philippe Gaillardon, Internal Group IN2P3 Computing Center Villeurbanne, France URL http://www.in2p3.fr/CC. BABAR Experiment.

tale
Download Presentation

BaBar Storage at Lyon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler, John O’Neall, Philippe Gaillardon, Internal Group IN2P3 Computing Center Villeurbanne, France URL http://www.in2p3.fr/CC

  2. BABAR Experiment • High-energy-physics experiment, started in July at SLAC • The IN2P3 Computing Center is the “mirror” computing site for Babar computing. • We will receive a copy of all Babar data (well, almost). • Also will produce simulated data, which will be stored as well as sent to SLAC. • Estimated data rate is on the order of 350 TB per year • SLAC has chosen HPSS to store this data; the CCIN2P3 is following their example. • Our initial goal is to do the same thing as SLAC for BABAR. • Files >~ 2 GB

  3. file file.lock How it works control ooss_Mig data Objectivity M (pftp) HPSS P ooss_Pur C amshpss R(3) (pfpt) L R(2) ooss_Stage R(1) (Creation, Lecture (read), Migration, Purge, Recovery)

  4. HPSS Configuration • For the moment, Babar only ==> like SLAC • One single Storage Class in one single COS • Tape only = Storagetek Redwoods, 9840 and MAGSTARs under study • No mirroring • All access to data via pftp_client • Additional tools from SLAC (Andy Hanushevsky)

  5. Objectivity Configuration Summary • 1 SUN E4500 (4 CPUs) + 2 SUN A3500, in total about 1.1 TB RAID 5, under Veritas VM/FS, with actual BaBar data • 1 SUN E4500 + 2 SUN A3500 as above, no data yet • 1 SUN E450 (4 CPUs) linked to IBM VSS disk space, about 400 GB RAID 5, with Veritas: tests starting next week • Intention: to have different Objy servers for different types of data

  6. Core Server

  7. HPSS Core Server • RS/6000 F50 • 4 CPUs, 1 GB memory • 2 x 4.5 GB mirrored system disks • 24 GB internal SSA disks for SFS (mirrored) • AIX 4.3.2 • Ethernet (control network) • DCE, Encina, SAMMI • OMI driver for Redwoods • Access to Storagetek ACL by ACSLS

  8. MoverStations

  9. HPSS Movers • Preliminary configuration, while waiting for choice of best machine to use with Gigabit Ethernet; also lacking BABAR usage profile • (Historical problem: Changed from ATM to Hi-speed Ethernet just as HPSS was arriving) • RS/6000 390, replacement under study (43P260?) • 1 CPU, 256 MB memory • 2 x 4.5 GB mirrored system disks • AIX 4.3.2 • Ethernet control network, Fast Ethernet data network

  10. Storagetek 4400 Silos (6)

  11. Performance • Reminder: Temporary mover/network configuration • Performance limited by: • Fast Ethernet data path (100 Mbps ==> < 8 MB/sec). • Mover CPUs: ~50 % occupied. • Punctual transfer: ~ 5 MB/sec per tape • Global rate slower because of cartridge mount and positioning time, ~ 3.5 MB/sec • Global max transfer rate: > 16 MB/sec (write), ~ 3 MB/sec (read)

  12. Errors during 2nd test (5 days)

  13. Particular problem: Tape errors • HPSS and Redwood cartridges, at least with our test usage pattern, do not seem to cohabit well, especially for random reading of ~ 2-GB files. • Redwoods need regular maintenance (every 100 hours or less) ==> need to be scheduled. Need stats from controllers. • Need effective maintenance from Storagetek. • Need tools to monitor volume and drive errors. • Need for HPSS to react automatically to volume and drive errors. (Example: unable to dismount cartridge ==> HPSS keeps trying indefinitely; drive errors during writing can turn drive into “black hole”)

  14. The good(?) news • Storagetek taking our problems seriously • Adopted several measures to “minimize our dissatisfaction” (thru end of 1999): • Maintenance presence > 1 hour/day • Check cartridges to see if any from known-bad batches • Problem “PINNACLE”, max severity, to handle problems • Procedure to follow up on all tapes and drives sent to Storagetek for analysis or repair • Permanent spare SD-3 at IN2P3 + replacement priority • Daily log analysis, to monitor errors and report them back to us • Goal: Anticipate bad vols or drives and replace before they break

  15. Other problem: HPSS manageability • SAMMI doesn’t make it for us. • Need to receive a user-configurable subset of the “alarms and events” messages in a script, which can then take the appropriate actions. • The “appropriate actions” require that appropriate commands be available in command-line form: • lock a volume or device; • forward a message via e-mail, Patrol, beeper or other means; • Many messages are not sufficiently precise or information is lacking.

  16. Summary • Greatest current problem is due to errors from Redwood drives; we are studying this problem with Storagetek France. This problem is exacerbated by the next one. • Greatest long-term problem is manageability, specifically, the lack of adequate non-graphic interfaces to HPSS to permit effective, automatic error detection, performance monitoring and alarm propagation.

More Related