1 / 16

Data management for ATLAS, ALICE and VOCE in the Czech Republic

Data management for ATLAS, ALICE and VOCE in the Czech Republic. L.Fiala, J. Chudoba, J. Kosina , J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac. Overview. Supported VOs (VOCE, ATLAS, ALICE) DPM as a choice of SRM-based Storage Element

blade
Download Presentation

Data management for ATLAS, ALICE and VOCE in the Czech Republic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac

  2. Overview • Supported VOs (VOCE, ATLAS, ALICE) • DPM as a choice of SRM-based Storage Element • Issues encountered with DPM • Results of transfers • Conclusion

  3. VOCE • Virtual Organization for Central Europe • in the scope of the EGEE project • provision of distributed Grid facitilites to non-HEP scientists • Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved • the design and implementation of VOCE infrastructure done solely on Czech Resources ALICE, ATLAS • Virtual Organizations for LHC experiments

  4. Storage Elements • Classical disk based SEs • Participating in Service Challenge 4 Need for SRM-enabled SE • No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE • 1 head node, 1 disk server on the same server • Separate nodes with disk servers planned • 5 TB on 4 filesystems (3 local, 1 NBD)

  5. DPM issues – srmCopy() • DPM does not currently support srmCopy() method (work in progress) • When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used • Local temporary storage or globus-url-copy can be used to avoid direct SRM to SRM 3rd party transfer using srmCopy()

  6. DPM issues – pools on NFS (1) • Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) • Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) • Silent file truncation when copying files from pools located on NFS

  7. DPM issues – pools on NFS (2) • Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() • Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) • Problem only when 2.4 client and 2.6 kernel (verified on various versions)

  8. DPM issues – pools on NFS (3) • Problem reported to DPM developers • Verified to be issue also with new VDT 1.3 (globus4, gridftp2) • Our workaround – used NBD instead of NFS • Important: DPM requires every fs in the pool to be a separate partition (free space calculation) • NBD is a suitable solution for case of shared filesystem

  9. DPM issues – rate limiting • SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests • On DPM TODO list • Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs …

  10. Atlas CSC • Golias100 receives data from Atlas CSC production • Defined in some lexor (Atlas LCG executor) instances as reliable storage element

  11. Data transfers via FTS • CERN – FZU, tested in April using FTS server at CERN

  12. FTS channel available only to associated Tier1 (FZK) Tests to another Tier1 possible only via transfers issued “by hand” Tests SARA - FZU: bulk copy from SARA to FZU, now with only one srmcp command 10 files: max speed 200 Mbps, average 130 Mbps 200 files: only 66 finished, the rest failed due to “Too many transfers” error Speed OK Data transfers via srmcp

  13. Tests Tier1 – Tier2 via FTS • FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) • FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU • Tunable parameters: • Number of files transferred simultaneously • Number of streams • Priorities between different VOs (ATLAS, ALICE, DTEAM)

  14. Results not stable: • Transfer of 50 files, each file 1GB • Starts fast, then timeouts occur: • Transfer of 100 files, each file 1GB • Started when load on Tier1 disk servers low

  15. ATLAS Tier0 test – part of SC4 • Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s • Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer • First data copied to FZU this Monday: ALICE plans FTS transfer test in July

  16. Conclusion • DPM is the only “light-weight” available Storage Element with SRM frontend • It has issues, but none of them are “show stoppers” and the code is under active development • Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4

More Related