1 / 23

PetaByte Storage Facility at RHIC

PetaByte Storage Facility at RHIC. Razvan Popescu - Brookhaven National Laboratory. Who are we?. Relativistic Heavy-Ion Collider @ BNL Four experiments: Phenix, Star, Phobos, Brahms. 1.5PB per year. ~500MB/sec. >20,000SpecInt95.

belva
Download Presentation

PetaByte Storage Facility at RHIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory

  2. Who are we? • Relativistic Heavy-Ion Collider @ BNL • Four experiments: Phenix, Star, Phobos, Brahms. • 1.5PB per year. • ~500MB/sec. • >20,000SpecInt95. • Startup in May 2000 at 50% capacity and ramp up to nominal parameters in 1 year. PetaByte Storage Facility at RHIC

  3. Overview • Data Types: • Raw: very large volume (1.2PB/yr.), average bandwidth (50MB/s). • DST: average volume (500TB), large bandwidth (200MB/s). • mDST: low volume (<100TB), large bandwidth (400MB/s). PetaByte Storage Facility at RHIC

  4. Data Flow (generic) ReconstructionFarm (Linux) RHIC raw 35MB/s DST raw 10MB/s Archive 50MB/s File Servers(DST/mDST) DST 200MB/s mDST mDST 400MB/s 10MB/s AnalysisFarm (Linux) PetaByte Storage Facility at RHIC

  5. The Data Store • HPSS (ver. 4.1.1 patch level 2) • Deployed in 1998. • After overcoming some growth difficulties we consider the present implementation successful. • One major/total reconfiguration to adapt to new hardware (and system understanding). • Flexible enough for our needs. One shortage: preemptable priority schema. • Very high performance. PetaByte Storage Facility at RHIC

  6. The HPSS Archive • Constraints - large capacity & high bandwidth: • Two types of tape technology: SD-3 (best $/GB) & 9840 (best $/MB/s). • Two tape layers hierarchies. Easy management of the migration. • Reliable and fast disk storage: • FC attached RAID disk. • Platform compatible with HPSS: • IBM, SUN, SGI. PetaByte Storage Facility at RHIC

  7. Present Resources • Tape Storage: • (1) STK Powderhorn silo (6000 cart.) • (11) SD-3 (Redwood) drives. • (10) 9840 (Eagle) drives. • Disk Storage: • ~8TB of RAID disk. • 1TB for HPSS cache. • 7TB Unix workspace. • Servers: • (5) RS/6000 H50/70 for HPSS. • (6) E450&E4000 for file serving and data mining. PetaByte Storage Facility at RHIC

  8. PetaByte Storage Facility at RHIC

  9. PetaByte Storage Facility at RHIC

  10. PetaByte Storage Facility at RHIC

  11. HPSS Structure • (1) Core Server: • RS/6000 Model H50 • 4x CPU • 2GB RAM • Fast Ethernet (control) • OS mirrored storage for metadata (6pv.) PetaByte Storage Facility at RHIC

  12. HPSS Structure • (3) Movers: • RS/6000 Model H70 • 4x CPU • 1GB RAM • Fast Ethernet (control) • Gigabit Ethernet (data) (1500&9000MTU) • 2x FC attached RAID - 300GB - disk cache • (3-4) SD-3 “Redwood” tape transports • (3-4) 9840 “Eagle” tape transports PetaByte Storage Facility at RHIC

  13. HPSS Structure • Guarantee availability of resources for a specific user group  separate resources  separate PVRs & movers. • One mover per user group  total exposure to single-machine failure. • Guarantee availability of resources for Data Acquisition stream  separate hierarchies. • Result: 2PVR&2COS&1Mvr per group. PetaByte Storage Facility at RHIC

  14. HPSS Structure PetaByte Storage Facility at RHIC

  15. Net 1 - Data (1000baseSX) 10baseT Client STK Core M1 M2 M3 (Routing) N x PVR pftpd Net 2 - Control (100baseT) HPSS Topology PetaByte Storage Facility at RHIC

  16. HPSS Performance • 80 MB/sec for the disk subsystem. • ~1 CPU per 40MB/sec for TCPIP Gbit traffic @ 1500MTU or 90MB/sec @ 9000MTU • >9MB/sec per SD-3 transport. • ~10MB/sec per 9840 transport. PetaByte Storage Facility at RHIC

  17. I/O Intensive Systems • Mining and Analysis systems. • High I/O & moderate CPU usage. • To avoid large network traffic merge file servers with HPSS movers: • Major problem with HPSS support on non-AIX platforms. • Several (Sun) SMP machines or Large (SGI) Modular System. PetaByte Storage Facility at RHIC

  18. Problems • Short lifecycle of the SD-3 heads. • ~ 500 hours < 2 months @ average usage. (6 of 10 drives in 10 months). • Built a monitoring tool to try to predict transport failure (based of soft error frequency). • Low throughput interface (F/W) for SD-3: high slot consumption. • SD-3 production discontinued?! • 9840 ??? PetaByte Storage Facility at RHIC

  19. Issues • Tested the two tape layer hierarchies: • Cartridge based migration. • Manually scheduled reclaim. • Work with large files. Preferable ~1GB. Tolerable >200MB. • Is this true with 9840 tape transports? • Don’t think at NFS. Wait for DFS/GPFS? • We use exclusively pftp. PetaByte Storage Facility at RHIC

  20. Issues • Guarantee avail. of resources for specific user groups: • Separate PVRs & movers. • Total exposure to single-mach. failure ! • Reliability: • Distribute resources across movers  share movers (acceptable?). • Inter-mover traffic: • 1 CPU per 40MB/sec TCPIP per adapter: Expensive!!! PetaByte Storage Facility at RHIC

  21. Inter-Mover Traffic - Solutions • Affinity. • Limited applicability. • Diskless hierarchies (not for DFS/GPFS). • Not for SD-3. Not enough tests on 9840. • High performance networking: SP switch. (This is your friend.) • IBM only. • Lighter protocol: HIPPI. • Expensive hardware. • Multiply attached storage (SAN). Most promising! See STK’s talk. Requires HPSS modifications. PetaByte Storage Facility at RHIC

  22. Summary • HPSS works for us. • Buy an SP2 and the SP switch. • Simplified admin. Fast interconnect. Ready for GPFS. • Keep an eye on the STK’s SAN/RAIT. • Avoid SD-3. (not a risk anymore) • Avoid small file access. At least for the moment. PetaByte Storage Facility at RHIC

  23. Thank you! Razvan Popescupopescu@bnl.gov

More Related