120 likes | 267 Views
Mass Storage For BaBar at SLAC http://www.slac.stanford.edu/~abh/HEPiX99-MSS/. Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy. BaBar & The B-Factory.
E N D
Mass Storage For BaBar at SLAChttp://www.slac.stanford.edu/~abh/HEPiX99-MSS/ Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy
BaBar & The B-Factory • Use big-bang energies to create B meson particles • Look at collision decay products • Answer the question “where did all the anti-matter go?” • 800 physicists collaborating from >80 sites in 10 countries • USA, Canada, China, France, Germany, Italy, Norway, Russia, UK, Taiwan • Data reconstruction & analysis requires lots of cpu power • Need over 250 Ultra 5’s to just to find particle tracks in the data • The experiment also produces large quantities of data • 200 - 400 TBytes/year for 10 years • Data stored as objects using Objectivity • Backed up offline on tape in HPSS • Distributed to regional labs across the world
HPSS Milestones • Production HPSS 4.1 deployed in May, 1999 • B-factory data taking begins • Solaris Mover is working • To date, ~12TBs data stored • Over 10,000 files written • STK 9840 tapes used exclusively • Over 300 tapes written
HPSS Core Server • RS6000/F50 running AIX 4.2.1 • 4 cpus • 1Gb RAM • 12 x 9Gb disk for Encina/SFS, etc • Use tape only storage heirarchy • Only use pftp to access data • One problem with BFS • symptom: pftp_client file open failures • two circumventions added to BFS
Solaris Tape Movers • SLAC port of mover using HPSS version 4.1 • Solaris machine configuration • Ultra-250 with 2 cpus, 512Mb RAM, Gigabit ethernet • Solaris 2.6, DCE 2.0, Encina TX4.2 • Three 9840 tape drives, each on separate Ultra SCSI bus • Observed peak load • CPU 60% busy • Aggregate I/O 26Mb/sec
Solaris Disk Movers • Does not use HPSS disk cache • Performance & Reliability • HPSS latency too high for small block transfers • Disk cache maintenance rather complex • Solaris machine configuration • E4500 & Ultra 450, 4 cpus, 1Gb RAM, Gigabit ethernet • A3500’s, RAID-5, 5-way striped, 2 controllers, 500 to 1TB • Ultra 250, 2 cpus, 512Mb RAM, Gigabit ethernet • A1000’s, RAID-5, 5-way striped, 2 controllers, 100 to 200TB • Solaris 2.6, DCE 2.0, Encina TX4.2 (DCE/Encina not necessary) • Observed peak load • CPU 65% busy • Aggregate I/O 10Mb/sec (no migration or staging at the time)
Prestage daemon Purge daemon Mass Storage Architecture File & catalog Gateway Requests Gateway management daemon Staging Manager HPSS PFTP Server AMS (unix fs I/o) ( control) Disk HPSS Mover Migration Server daemon (Solaris) Tape Robot Disk PFTP Pool ( data)
Summary • HPSS is very stable • Mass Storage architecture has proven to be highly flexible • Solaris mover is a success • 9840 working well for new technology • Software upgrades will be a problem • Disk Space is always an issue • Will be getting 1TB/Month for the next year (total of about 25TB) • Tape drive contention concerns • Will be getting 12 more this year (for a total of 24)