310 likes | 334 Views
Explore the history, current status, plans, and hardware/software details of the Tier1A storage infrastructure like dCache and tape systems at RAL. Details on deployments for ATLAS, CMS, DTeam, and LHCb. Learn about disk servers, file system replication, data protocols, and use cases. Understand the challenges, deployment techniques, and future expansion plans.
E N D
Storage at RAL Tier1A Jeremy Coles j.coles@rl.ac.uk
Outline • Disk • Current Status and Plans • dCache • Tape • Current Status and History • SRM etc • Plans • Hardware • Software
2002-3 80TB Dual Processor Server Dual channel SCSI interconnect External IDE/SCSI RAID arrays (Accusys and Infortrend) ATA drives (mainly Maxtor) Cheap and (fairly) cheerful 2004 (140TB) Infortrend Eonstore SATA/SCSI RAID Arrays 16*250GB Western Digital SATA per array Two arrays per server Tier1A Disk
Implementation • Used by BaBar and other experiments as well as LHC • 60 disk servers nfs-exporting their filesystems • Potential scaling problems if every cpu node wants to use the same disk • Servers allocated to VOs so no contention or interference • Need a means of pooling servers • so looked at dCache
Why we tried dCache? • Gives you a virtual file space across many file systems optionally on several nodes • Allows replication within file space to increase redundancy • Allows a tape system to be interfaced at the back to further increase redundancy and storage available • Data protocols are scalable, one GridFTP interface per server is easy and transparent • Only SRM available for disk pools
dCache Doors • Doors (interfaces) can be created into the system • GridFTP • SRM • GSIDCAP • GFAL gives you a POSIX interface to this. • All of these are GSI enabled but Kerberos doors also exist • Everything remains consistent regardless of the door that is used
Mid 2003 We deployed a non grid version for CMS. It was never used in production. End of 2003/Start of 2004 RAL offered to package a production quality dCache. Stalled due to bugs and holidays went back to developers and LCG developers. September 2004 Redeployed DCache into LCG system for CMS, and DTeam VOs. dCache deployed within JRA1 testing infrastructure for gLite i/o daemon testing. History of dCache at RAL
dCache at RAL today • Now deployed for ATLAS, CMS, DTeam and LHCb. • 5 disk servers made up of 16 * 1.7TB partitions. • CMS, the only serious users of dCache at RAL, have saved 2.5 TB in the system. • They are accessing byte ranges via the GSIDCAP posix interface.
Pool Group Per VO • We found we could not quota file space between VOs. • Following advice, DCache redeployed with a pool group per VO. • Still only one SRM frontend. Data channels to it will be switched off as we found that data transfers kill the head node. • Now unable to publish space per VO…
Current Installation Technique • The Tier-1 now has its own cookbook to follow but it is not generic at this time. • Prerequisites • VDT for certificate infrastructure. • edg-mkgridmap for grid-mapfile. • J2RE. • Host certificate for all nodes with a GSI door.
Unanswered Questions • How do we drain a node for maintenance? • CHEP papers and statements from developers say this is possible. • How do we support small VOs. • 1.7TB our standard partition size and pools fill a partition.
SRB RAL supports (and develops) SRB for other communities Ran MCAT for worldwide CMS simulation data. SRB is interfaced to the Atlas datastore Committed to supporting SRB Xrootd xrootd interface to the BaBar data held on disk and in the ADS at RAL ~15 TB of data of which about 10TB is in the ADS Will expand both to about 70TB in the next few months. BaBar is planning to use xrootd access for background and Conditions files for Monte Carlo production on LCG, basic test have been run on WAN access to xrootd in Italy and RAL will be involved in more soon Other interfaces
General purpose, multi user, data archive. In use over 20 years. Four major upgrades. Current capacity 1PB – largest (non dedicated) multi user system in UK academia? History M860 110GB STK 4400 1.2Tbytes IBM 3494 30Tbytes STK 9310 1Pbyte Tape Overview
Test system Production system Physical connection (FC/SCSI) Sysreq udp command User SRB command STK ACSLS command VTP data transfer SRB data transfer dylan AIX Import/export 8 x 9940 tape drives STK 9310 buxton SunOS ACSLS Tape devices 4 drives to each switch basil AIX test dataserver Brocade FC switches SRB pathtape commands ADS_switch_1 ADS_Switch_2 ADS0CNTR Redhat counter ADS0PT01 Redhat pathtape ADS0SB01 Redhat SRB interface cache User pathtape commands Logging cache mchenry1 AIX Test flfsys florence AIX dataserver ermintrude AIX dataserver zebedee AIX dataserver dougal AIX dataserver brian AIX flfsys admin commands create query catalogue array3 array1 array4 array2 catalogue All sysreq, vtp and ACSLS connections to dougal also apply tothe other dataserver machines, but are left out for clarity User SRB Inq; S commands; MySRB ADS tape ADS sysreq Thursday, 04 November 2004
catalogue data user administrators SE Atlas Datastore Architecture Robot Server (buxton) Catalogue Server (brian) Copy C Copy A Copy B flfsys tape commands (sysreq) CSI ACSLS recycling (+libflf) read ACSLS API flfqryoff (copy of flfsys code) flfsys import/export commands (sysreq) Backup catalogue flfdoback (+libflf) read Tape Robot control info (mount/ dismount) read stats flfdoexp (+libflf) LMU cellmgr backend Pathtape Server (rusty) IBM tape drive STK tape drive flfsys (+libflf) pathtape data short name (sysreq) data servesys flfsys farm commands (sysreq) frontend long name (sysreq) flfsys admin commands (sysreq) SSI flfstk ? flfsys user commands (sysreq) (sysreq) (sysreq) flfaio cache disk flfscan flfaio Farm Server user program datastore (script) flfaio vtp tapeserv I/E Server (dylan) data transfer (libvtp) vtp tape importexport 28 Feb 03 - 2 B Strong libvtp User Node
STK 9310 “Powderhorn” with 6000 slots (1.2Pbytes) 4 IBM 3590 B drives now phased out 10 Gbyte native 10 Mbyte/s transfer 8 New STK 9940B drives 200 Gbyte native 30Mbytes/sec/drive transfer 240Mbyte/sec theoretical maximum bandwidth 4 RS6000 Data servers (+ 4 “others”) 1Gbit networking (Expected to become 10Gbit by 2005) Data Migration to new media completed ~ Feb 2004 Hardware upgrade - completed Jun 2003
Strategy • De-couple user and application from storage media. • Upgrades and media migration occur “behind the scenes” • High resilience - very few Single Point Failures • High reliability high, availability (99.9986%) • Constant environmental monitoring linked to alarm/call out • Easy to exploit (endless) new technology • Lifetime data integrity checks hardware and software • Fire safe and off-site backups; Tested disaster recovery procedures; media migration, recycling • Technology watch to monitor future technology path
Supported interfaces • We have successfully implemented a variety of layers on top of ADS to support standard interfaces • FTP, OOFS, Globus IO, SRB, EDG SE, SRM, xrootd • so we can probably support others
Overall Storage Goals – GridPP2 • Provide SRM interfaces to: • The Atlas Petabyte Storage facility at RAL • Disk (for Tier 1 and 2 in UK) • Disk pools (for Tier 1 and 2 in UK) • Deploy and support interface to Atlas Datastore • Package and support interfaces to disk
Current status • EDG-SE interface to ADS • published as SE in LCG • supported by edg-rm • SRM v1.1 interface to ADS • Tested with GFAL (earlier versions, <1.3.7) • Tested with srmcp (the dCache client) • Based on EDG Storage Element • Also interfaces to disk • Also working with RAL Tier 1 on dCache • Install and support, including the SRM
(Short Term) Timeline • Provide a release of SRM to disk and disk array by end of January 2005 • Coincide with the EGEE gLite “release” • Plan to match the path toward the full gLite release
(Short Term) Strategy • Currently considering both EDG SE and dCache SRM to ADS Storage Element SRM to disk dCache + dCache-SRM • Look at both to meet all goals: • Some duplicated effort, but helps mitigating risks • Can fall back to only one (which may be dCache) • In the long term, we will probably have a single solution SRM to disk pool
Acceptance tests • SRM tests – SRM interface must work with: • srmcp (the dCache SRM client) • GFAL • gLite I/O • Disk pool test – must work with • dccp (dCache specific) • plus SRM interface on top
Questions • What is the future of CERN’s DPM? • We want to test it • Should we start implementing SRM 3? • Will dCache ever go Open Source?
Planned Tape Capacity Don’t believe 2008 figures, reviewing storage in this timeframe
ADS Plans • Planning a wider UK role in Data Curation and Storage (potentially 10-20PB by 2014) • Review software layer – use of Castor possible • Capacity plans based on adding STK Titanium 1 in 2005/06 and Titanium 2 in 2008/09
Summary • Working implementation of dCache for disk pools (main user is CMS) • Some outstanding questions • Plan to involve some Tier-2s shortly • We will review other implementations as they become available • RAL ADS supports SRB and xrootd for other communities.