180 likes | 295 Views
CMS : T1 Disk/Tape separation. Nicol ò Magini , CERN IT/SDC Oliver Gutsche , FNAL November 11 th 2013. Outline. Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and procedures. Introduction.
E N D
CMS: T1 Disk/Tape separation NicolòMagini, CERN IT/SDC Oliver Gutsche, FNAL November 11th 2013
Outline Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and procedures WLCG Workshop: Disk/Tape separation
Introduction • CMS asked the Tier-1 sites to change their storage setup to gain more flexibility and control of the available disk and tape resources • Old setup: • One MSS system controlling both disk and tape • Automatic migration of new files to tape • Disk pool automatically purges unpopular files to make room for more popular files • Automatic recall of files from tape when accessing files without disk copy • Several disadvantages: • Pre-staging needed for organized processing, not 100% efficient because system was still allowed to automatically purge files if needed • User analysis was not allowed at Tier-1 sites to protect the tape drives from chaotic user access patterns WLCG Workshop: Disk/Tape separation
Disk/Tape separation • CMS asked the Tier-1 sites to separate disk and tape and base the management of both on PhEDEx • Sites were asked to deploy two independent [*] PhEDEx endpoints • “Large” [**] persistent disk • Tape archive with “small” [**] disk buffer • All file access will be restricted to the disk endpoint • All processing will write only on the disk endpoint • [*] Can write/delete a file on disk-only, or on tape-only, or on both simultaneously • [**] “small” ~ 10% of “large”, but can be sized according to expected rates to tape WLCG Workshop: Disk/Tape separation
Motivation Increase flexibility for Tier-1 processing Enable user analysis at Tier-1s Enable remote access of Tier-1 data WLCG Workshop: Disk/Tape separation
Processing at Tier-1s: Location independence • Use case: • Organized processing needs to access input samples stored custodially on tape at one of the Tier-1 sites • Old model: • Jobs needed to run close to tape endpoint hosting input and output data (custodial location) • New model: • Jobs can run against any disk endpoint, not necessarily close to tape endpoint hosting input or output data • Benefit of new model: • Custodial distribution optimizes tape space utilization taking into account processing capacities of the Tier-1 sites • Not all data is being accessed at the same time causing uneven processing resource utilization • Location independence enables to use both tape and processing resources efficiently at the same time WLCG Workshop: Disk/Tape separation
Processing at Tier-1s: Pre-staging and Pinning • Use case: • Staging and pinning input files to local disk for organized processing is required to optimize CPU efficiency • Input files need to be released from disk when processing is done • Old model: • Pre-staging via SRM or Savannah tickets was used to convince the MSS to have input files available on disk • Release of input relied on automatic purge within MSS • New model: • CMS will centrally subscribe and therefore pre-stage input files to have them available on disk before jobs start • CMS will permanently keep input files on disk for regular activities • Benefit of new mode: • CMS is in control of what is on disk at the Tier-1 sites and can optimize disk utilization (CMS will have to actively manage the disk space through PhEDEx) WLCG Workshop: Disk/Tape separation
Processing at Tier-1s: Output from central processing • Use case: • Central processing produces output which needs to be archived on tape • Old model: • Output of individual workflows could only be produced at one site, the site of the custodial location • New model: • Output can be produced at one or more disk endpoints, then migrated to tape only at single final custodial location • Benefit of new model: • CMS can optimize processing resource utilization • Tier-1s with no free tape are no longer idle • CMS can validate data before final tape migration, reducing unnecessary tape usage WLCG Workshop: Disk/Tape separation
Impact on data federation • CMS would like to benefit from a fully deployed CMS data federation • Tier-1s need to publish files on the disk endpoints in the Xrootd federation • Eventually, all popular data will be accessible through the federation • Benefits: • Further optimize processing resource utilization by processing input files without the need to relocate samples through PhEDEx • Enables processing not only on remote Tier-1 sites through the LHCOPN but also at Tier-2 sites WLCG Workshop: Disk/Tape separation
Technical implementation • Sites and storage providers free to choose implementation • Two possibilities identified in practice: • Two independent storage endpoints • CERN, FNAL • Single storage endpoint with two different trees in the namespace • RAL, KIT, CNAF, CCIN2P3, PIC WLCG Workshop: Disk/Tape separation
Internal transfers • Currently using standard tools for disktape buffer transfers at all sites • e.g. FTS, xrdcp • No bottleneck seen so far • If needed, internal optimizations are possible with a single endpoint • e.g. on a single dCache endpoint, internal data flow can be delegated to the pools WLCG Workshop: Disk/Tape separation
Site concerns • Main site concern has been duplication of space used between disk and tape buffer • Should not be a big effect given the “small” size of the buffer in front of tape • For dCache, a solution is planned: • “flush-on-demand” command creating a hard link in tape namespace instead of copy • development schedule will depend on need, for now gather experience with current version WLCG Workshop: Disk/Tape separation
Current status • DONE • RAL, CNAF • KIT (in commissioning last week) • ~ DONE • CERN (except for Tier-0 streamers and user) • IN PROGRESS • PIC, CCIN2P3, FNAL WLCG Workshop: Disk/Tape separation
Issues • At sites • No blocking technical issues • Not stress-tested yet: challenge in 2014? • In CMS software • Minor update needed in PhEDEx to handle disktape moves • Need to settle data location for job matching • PhEDEx node vs. SE… • CMS internal, in progress WLCG Workshop: Disk/Tape separation
Changes in operations and procedures • The Tier-1 disk endpoint is a central space • CMS will manage subscriptions and deletions on disk • Tape endpoint subscriptions are subject to approval by Tier-1 data managers (functions that are held by site-local colleagues) • CMS would like to auto-approve disk subscription and deletion requests to be able to reduce latencies WLCG Workshop: Disk/Tape separation
Changes in operations and procedures • Tape families: • Together with the Tier-1 sites, CMS optimized placement of files on tape for reading by requesting tape families • In the old model, tape family requests needed to be made before processing started, could lead to complications if forgotten • New model allows processing on disk endpoints without the need for tape families • A PhEDEx subscription archives the output to tape: needs to be approved by the site-local data manager • Tape family requests by CMS are not needed anymore, Sites can create tape families before approving archival PhEDEx subscriptions • CMS is happy and available for the sites to optimize rules for tape family creation • CMS would like to evolve the tape family procedure from requesting individual families to a dialogue with the sites defining tape family setups and rules WLCG Workshop: Disk/Tape separation
Changes in site readiness • Site readiness metrics for Tier-1s will evolve taking into account separated disk and tape PhEDEx endpoints • SAM tests only on CEs close to disk • SAM tests for SRM both on disk and on tape endpoints • More links to monitor: • diskWAN • tapeWAN • disktape WLCG Workshop: Disk/Tape separation
Conclusions • Hosting Tier-1 data on disk will increase flexibility in all computing workflows • Technical solutions identified for all sites • Deployment in progress with no blocking issues, expecting completion at all sites by beginning of 2014 • For more details: • https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompProjDiskTape • https://indico.cern.ch/conferenceDisplay.py?confId=249032 WLCG Workshop: Disk/Tape separation