230 likes | 353 Views
PetaByte Storage Facility at RHIC. Razvan Popescu - Brookhaven National Laboratory. Who are we?. Relativistic Heavy-Ion Collider @ BNL Four experiments: Phenix, Star, Phobos, Brahms. 1.5PB per year. ~500MB/sec. >20,000SpecInt95.
E N D
PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory
Who are we? • Relativistic Heavy-Ion Collider @ BNL • Four experiments: Phenix, Star, Phobos, Brahms. • 1.5PB per year. • ~500MB/sec. • >20,000SpecInt95. • Startup in May 2000 at 50% capacity and ramp up to nominal parameters in 1 year. PetaByte Storage Facility at RHIC
Overview • Data Types: • Raw: very large volume (1.2PB/yr.), average bandwidth (50MB/s). • DST: average volume (500TB), large bandwidth (200MB/s). • mDST: low volume (<100TB), large bandwidth (400MB/s). PetaByte Storage Facility at RHIC
Data Flow (generic) ReconstructionFarm (Linux) RHIC raw 35MB/s DST raw 10MB/s Archive 50MB/s File Servers(DST/mDST) DST 200MB/s mDST mDST 400MB/s 10MB/s AnalysisFarm (Linux) PetaByte Storage Facility at RHIC
The Data Store • HPSS (ver. 4.1.1 patch level 2) • Deployed in 1998. • After overcoming some growth difficulties we consider the present implementation successful. • One major/total reconfiguration to adapt to new hardware (and system understanding). • Flexible enough for our needs. One shortage: preemptable priority schema. • Very high performance. PetaByte Storage Facility at RHIC
The HPSS Archive • Constraints - large capacity & high bandwidth: • Two types of tape technology: SD-3 (best $/GB) & 9840 (best $/MB/s). • Two tape layers hierarchies. Easy management of the migration. • Reliable and fast disk storage: • FC attached RAID disk. • Platform compatible with HPSS: • IBM, SUN, SGI. PetaByte Storage Facility at RHIC
Present Resources • Tape Storage: • (1) STK Powderhorn silo (6000 cart.) • (11) SD-3 (Redwood) drives. • (10) 9840 (Eagle) drives. • Disk Storage: • ~8TB of RAID disk. • 1TB for HPSS cache. • 7TB Unix workspace. • Servers: • (5) RS/6000 H50/70 for HPSS. • (6) E450&E4000 for file serving and data mining. PetaByte Storage Facility at RHIC
HPSS Structure • (1) Core Server: • RS/6000 Model H50 • 4x CPU • 2GB RAM • Fast Ethernet (control) • OS mirrored storage for metadata (6pv.) PetaByte Storage Facility at RHIC
HPSS Structure • (3) Movers: • RS/6000 Model H70 • 4x CPU • 1GB RAM • Fast Ethernet (control) • Gigabit Ethernet (data) (1500&9000MTU) • 2x FC attached RAID - 300GB - disk cache • (3-4) SD-3 “Redwood” tape transports • (3-4) 9840 “Eagle” tape transports PetaByte Storage Facility at RHIC
HPSS Structure • Guarantee availability of resources for a specific user group separate resources separate PVRs & movers. • One mover per user group total exposure to single-machine failure. • Guarantee availability of resources for Data Acquisition stream separate hierarchies. • Result: 2PVR&2COS&1Mvr per group. PetaByte Storage Facility at RHIC
HPSS Structure PetaByte Storage Facility at RHIC
Net 1 - Data (1000baseSX) 10baseT Client STK Core M1 M2 M3 (Routing) N x PVR pftpd Net 2 - Control (100baseT) HPSS Topology PetaByte Storage Facility at RHIC
HPSS Performance • 80 MB/sec for the disk subsystem. • ~1 CPU per 40MB/sec for TCPIP Gbit traffic @ 1500MTU or 90MB/sec @ 9000MTU • >9MB/sec per SD-3 transport. • ~10MB/sec per 9840 transport. PetaByte Storage Facility at RHIC
I/O Intensive Systems • Mining and Analysis systems. • High I/O & moderate CPU usage. • To avoid large network traffic merge file servers with HPSS movers: • Major problem with HPSS support on non-AIX platforms. • Several (Sun) SMP machines or Large (SGI) Modular System. PetaByte Storage Facility at RHIC
Problems • Short lifecycle of the SD-3 heads. • ~ 500 hours < 2 months @ average usage. (6 of 10 drives in 10 months). • Built a monitoring tool to try to predict transport failure (based of soft error frequency). • Low throughput interface (F/W) for SD-3: high slot consumption. • SD-3 production discontinued?! • 9840 ??? PetaByte Storage Facility at RHIC
Issues • Tested the two tape layer hierarchies: • Cartridge based migration. • Manually scheduled reclaim. • Work with large files. Preferable ~1GB. Tolerable >200MB. • Is this true with 9840 tape transports? • Don’t think at NFS. Wait for DFS/GPFS? • We use exclusively pftp. PetaByte Storage Facility at RHIC
Issues • Guarantee avail. of resources for specific user groups: • Separate PVRs & movers. • Total exposure to single-mach. failure ! • Reliability: • Distribute resources across movers share movers (acceptable?). • Inter-mover traffic: • 1 CPU per 40MB/sec TCPIP per adapter: Expensive!!! PetaByte Storage Facility at RHIC
Inter-Mover Traffic - Solutions • Affinity. • Limited applicability. • Diskless hierarchies (not for DFS/GPFS). • Not for SD-3. Not enough tests on 9840. • High performance networking: SP switch. (This is your friend.) • IBM only. • Lighter protocol: HIPPI. • Expensive hardware. • Multiply attached storage (SAN). Most promising! See STK’s talk. Requires HPSS modifications. PetaByte Storage Facility at RHIC
Summary • HPSS works for us. • Buy an SP2 and the SP switch. • Simplified admin. Fast interconnect. Ready for GPFS. • Keep an eye on the STK’s SAN/RAIT. • Avoid SD-3. (not a risk anymore) • Avoid small file access. At least for the moment. PetaByte Storage Facility at RHIC
Thank you! Razvan Popescupopescu@bnl.gov