160 likes | 282 Views
Data management for ATLAS, ALICE and VOCE in the Czech Republic. L.Fiala, J. Chudoba, J. Kosina , J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac. Overview. Supported VOs (VOCE, ATLAS, ALICE) DPM as a choice of SRM-based Storage Element
E N D
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac
Overview • Supported VOs (VOCE, ATLAS, ALICE) • DPM as a choice of SRM-based Storage Element • Issues encountered with DPM • Results of transfers • Conclusion
VOCE • Virtual Organization for Central Europe • in the scope of the EGEE project • provision of distributed Grid facitilites to non-HEP scientists • Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved • the design and implementation of VOCE infrastructure done solely on Czech Resources ALICE, ATLAS • Virtual Organizations for LHC experiments
Storage Elements • Classical disk based SEs • Participating in Service Challenge 4 Need for SRM-enabled SE • No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE • 1 head node, 1 disk server on the same server • Separate nodes with disk servers planned • 5 TB on 4 filesystems (3 local, 1 NBD)
DPM issues – srmCopy() • DPM does not currently support srmCopy() method (work in progress) • When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used • Local temporary storage or globus-url-copy can be used to avoid direct SRM to SRM 3rd party transfer using srmCopy()
DPM issues – pools on NFS (1) • Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) • Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) • Silent file truncation when copying files from pools located on NFS
DPM issues – pools on NFS (2) • Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() • Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) • Problem only when 2.4 client and 2.6 kernel (verified on various versions)
DPM issues – pools on NFS (3) • Problem reported to DPM developers • Verified to be issue also with new VDT 1.3 (globus4, gridftp2) • Our workaround – used NBD instead of NFS • Important: DPM requires every fs in the pool to be a separate partition (free space calculation) • NBD is a suitable solution for case of shared filesystem
DPM issues – rate limiting • SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests • On DPM TODO list • Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs …
Atlas CSC • Golias100 receives data from Atlas CSC production • Defined in some lexor (Atlas LCG executor) instances as reliable storage element
Data transfers via FTS • CERN – FZU, tested in April using FTS server at CERN
FTS channel available only to associated Tier1 (FZK) Tests to another Tier1 possible only via transfers issued “by hand” Tests SARA - FZU: bulk copy from SARA to FZU, now with only one srmcp command 10 files: max speed 200 Mbps, average 130 Mbps 200 files: only 66 finished, the rest failed due to “Too many transfers” error Speed OK Data transfers via srmcp
Tests Tier1 – Tier2 via FTS • FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) • FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU • Tunable parameters: • Number of files transferred simultaneously • Number of streams • Priorities between different VOs (ATLAS, ALICE, DTEAM)
Results not stable: • Transfer of 50 files, each file 1GB • Starts fast, then timeouts occur: • Transfer of 100 files, each file 1GB • Started when load on Tier1 disk servers low
ATLAS Tier0 test – part of SC4 • Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s • Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer • First data copied to FZU this Monday: ALICE plans FTS transfer test in July
Conclusion • DPM is the only “light-weight” available Storage Element with SRM frontend • It has issues, but none of them are “show stoppers” and the code is under active development • Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4