1 / 7

Tier-2 storage

Tier-2 storage. A hardware view. HEP Storage. dCache needs feed and care although setup is now easier. DPM easier to deploy xrootd (as system) is also in the picture but no SRM dCache and DPM use DB for metadata (single point of failure) scalability is for T2 not much of an issue

amanda
Download Presentation

Tier-2 storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tier-2 storage A hardware view

  2. HEP Storage • dCache • needs feed and care although setup is now easier. • DPM • easier to deploy • xrootd (as system) is also in the picture but no SRM • dCache and DPM use DB for metadata (single point of failure) • scalability is for T2 not much of an issue • although this depends on the access pattern • any Analysis experience? • FileSystems • Mostly XFS but has its flaws • Many look at ZFS. • Gridka uses GPFS • ext4 > 16 TB fs, has extents. (still development)

  3. Disk arrangements • For CMS in 2008 all T2 • 19.3 MSI2k (~ 800kSI2k / avg T2) • 4.9 PB (~200 TB / avg T2) • RAID groups of 8 data disks → 750 GB/disk = 340 disks in 34 RAID6 groups (34 * 8 * 50 = 13600 IOs / s) • 800kSI2k / 2kSI2k/core → 400 cores • Available are 13600/400 = 34 IOs/core/s • Writes reduce this by 50% → 17 IOs/core/s • 50 MB/s / 17 IOs/core/s → 3 MB / IO / core • 1 – 3 MB/s / core → 1200 MB / s → ~ 24 data servers • Given the above number of 34 RAID groups use 34 data servers • assume 50 MB/s per server, although today dCache tops out at around 30 MB/s per java virtual machine

  4. Disk thumb rules • from different cores: random access (although you have large > 2 GB files!) • avg access of SATA is ~ 15 ms: ~ 50 IOs / s • avg access of FC/SAS disks ~ 5 ms: 150 IOs / s • SATA RW mix (buffers!): 1 write + 20 read accesses. End of story. • SATA reliability is OK. Expect: 800 euro / TB (incl. system) • RAID6 is suggested and the need for proper support (hot swap, alerts, failover) • experience != experience: see summary on http://hepix.caspur.it/storage/ (hepix/hepix) • calculate some servers that need to be HA (>3000 euro)

  5. Disk configurations • Storage in a box (NAS) • 16 to 48 disks with server nodes in a case • popular example: SUN Thumper 48 disk, dual opteron. • DAS: storage and server separate • required IO rates do not apply for big servers • but the random access applies for many servers • may use some compute nodes to do the work • which would need SAS or FC to the storage • Resilient dCache • probably good for “read-mostly” • From earlier core to disk estimate • need 20 big NAS boxes • could be done with 4 servers but not with 4 links

  6. Use Casesthanks to Thomas Kress for the input • MC and Pile-Up • mostly CPU bound. Events are merged to large files before transfer to T1 via output buffer • 12 MB/s • write and read streams on the buffer? • suggestion: 1 write stream for 20 read streams • PileUp sample is 100-200 GB • random access by how many cores? • suggestion: spread over many raid groups • Calibration • storage area of 400 GB • read only? random or stream? • suggestion: max of 50 cores per disk (group) • Analysis • 100-200TB per month with random access!! • avg flow of ~80 MB/s T1 to T2 ! (3000-4000 files) • following above 1:20 this means a system that sustains 1600 MB/s read • ? “ein großer Teil der Daten eine längere Zeit vorhanden bleibt” (TK)

  7. ToDo • Analyze access patterns • Simulate data/disk loss • Iterate results • Join forces for HW procurement

More Related