170 likes | 180 Views
Detailed history of DCache deployments at RAL, including challenges and future plans. Learn about VO support, SRM optimization, and network connections.
E N D
DCache Deployment at Tier1A UK HEP Sysman April 2005
DCache at RAL 1 Mid 2003 We deployed a non grid version for CMS. It was never used in production. End of 2003/Start of 2004 RAL offered to package a production quality DCache. Stalled due to bugs and and went back to developers and LCG developers.
DCache at RAL 2 Mid 2004 Small deployment for EGEE JRA1 Intended for gLite i/o testing. End of 2004 CMS instance 3 disk servers ~ 10TB disk space Disk served via nfs to pool nodes. Each pool node running a gridftp door. In LCG information system.
DCache at RAL 4 Start of 2005 New Production instance supporting CMS, DTeam,LHCb and Atlas VOs. 22TB disk space. CMS instance decommissioned and reused. Separate gdbm file for each VO. Uses directory-pool affinity to map areas of file system to VOs’ assigned disk.
DCache at RAL 5 Early 2005 Service Challenge 2 4 disk servers ~ 12TB disk space. UKLight connection to CERN. Pools directly on disk servers. Standalone Gridftp and SRM doors . SRM not used in Challenge due to software problems at CERN. Interfaced to Atlas Data Store.
SC2 instance UKLIGHT (2*1Gps) Summit 7i SRM D/B Diskless GridFTP doors head gridftp gridftp gridftp gridftp gridftp gridftp gridftp gridftp Nortel 5510 Stack (80Gps) 3 TB 3 TB 3 TB 3 TB 8 dCache pools SJ4 2*1Gps dCache
SC2 results Achieved 75MB/s to disk, 50MB/s to tape Seen faster - 3000Mb/s to disk over LAN Network delivered at last minute, under-provisioned Odd iperf results, high udp packet loss.
Future Developments Interface ADS to production dCache Considering second srm door. Implement script to propagate deletes from dCache to ADS Service Challenge 3 Still planning. Use production dCache. Experiments may want to retain data. Avoid multi-homing if possible. Connect UKLight into site network.
Production Setup Testing: Dteam only for now Proposed
VO Support Bit of a hack – DCache has no concept of VOs Gridmap periodically run through perl script to produce mapping of DN to Unix UID/GID. Each vo member mapped to first pool account of vo. All vo’s files owned by that account. VOMS support coming…
Postgres Postgres SRM database is CPU hog Being worked on. Current recommendation is a separate host for PostgreSQL. Can use the database to store dCache transfer information for monitoring. In future may be possible to use for pnfs databases.
SRM requests Each SRM request lasts for (default) 24 hours if not finished properly. Too many and the srm door queues new requests until slot available. Educate users to use lcg-sd after an lcg-gt, don’t Ctrl-C lcg-rep…
SRM-SRM copies Pull mode If dCache is the destination, then the destination pool initiates the gridftp transfer from the source srm. Need dcache-opt rpm installed (don’t need gridftp door running) on pools. Pool node need certificate and GLOBUS_TCP_PORT_RANGE accessible to incoming. Lcg-utils don’t do this but srmcp does.
Quotas If two vo’s can access same pool, no way to stop one vo grabbing all of pool. No global quotas Hard to do, pools can come and go Only way to restrict disk usage is limit pools a vo can write to. But can’t get space available per vo.
Links http://ganglia.gridpp.rl.ac.uk/?c=DCache http://ganglia.gridpp.rl.ac.uk/?c=SC