CMS data transfer tests

CMS data transfer tests • Data transfer tests: • Focused and short-term programme of work • Objective: apply existing knowledge and tools to problem of sustained data throughput for LHC and running experiments • Typical goal: CMS DC04. Others exist (will likely run in parallel) • 60Mbps aggregate into/out of MSS • Up to 200Mbps across WAN, data exchange with T0 / T1 peers • Sustained for months, not hours • People: (at least) • Tier-1: M. Bly, A. Sansum, N. White ++ • Net: P. Clarke, R. Hughes-Jones, Y. Li, M. Rio, R. Tasker ++ • CMSUK: T. Barrass, O. Maroney, S. Metson, D. Newbold • CMS: I. Fisk (UCSD), N. Sinanis (CERN), T. Wildish (CERN) ++ Dave Newbold, University of Bristol 16/12/2002

Programme of work • The overall programme (of CMS), next few months • A: infrastructure tests / optimisation (at this stage now in UK) • B: replica management system functional / stress tests (starting now); test (break) a selection of ‘products’ at different layers • Globus RM, mysql-based EDG RM, SRB; dcache, EDG SE, etc • C: large scale deployment of chosen combination with existing data (ties to GridPP milestones); around 20TB to publish • D: use as baseline replica management service for DC04 • Short-term goals, next few weeks • Measure what the real throughput situation is (within UK and between T1s); repeat regular measurements • Attack the problem at bottleneck points, typically through hardware and software improvements at endpoints • Start to deploy selected high-level replica management tools Dave Newbold, University of Bristol 16/12/2002

First results • First attempts at controlled monitoring • Endpoints: RAL, Bristol, CERN • Simple RTT and throughput measurements over 24 hours • Throughput: measured with iperf, 8 streams, up to 256k buffers • Clearly can be done more effectively by deploying ‘real’ monitoring package • ‘Real world’ tests (not in parallel!) • Instrumented existing production data movement tools • Averaged figures for a 1TB data copy (more detail soon) • Disk -> disk, RAL to CERN • Disk -> MSS, RAL • Hardware available: • RAL, Bristol: dedicated machines (which we can tweak) • CERN: ‘fast-path’ but shared server; new hardware coming Dave Newbold, University of Bristol 16/12/2002

Results: RTT Dave Newbold, University of Bristol 16/12/2002

Results: mem-mem copy Dave Newbold, University of Bristol 16/12/2002

Results: ‘real world’ copy • Disk -> disk • csfnfs15.rl.ac.uk -> cmsdsrv08.cern.ch • 1TB dataset transferred from disk -> disk (and thence to castor) • bbcp using 8 streams and CRC check; not CPU or disk limited • 65Mbps aggregate throughput; somewhat ‘lumpy’ • Castor not yet a bottleneck (but did we fill up the stage pool?) • Disk -> tape • csfnfs15.rl.ac.uk -> csfb.rl.ac.uk (NFS mount) -> datastore • Same dataset as above; average file size ~1.5GByte. • Three parallel write processes; volumes created on the fly • 40Mbps aggregate throughput • Try in-order and out-of-order readback next • Reason to believe resources were not ‘shared’ in this case Dave Newbold, University of Bristol 16/12/2002

Upcoming work • Monitoring: • Deploy ‘real’ tools at some stage soon (at many sites) • Hardware / infrastructure • High-spec dedicated server obtained at CERN (US generosity); online soon • HW upgrades / OS and config tweaks at all sites as necessary • Investigate disk and MSS performance at RAL in more detail • “experts working” • Replica management • Installation of latest EDG tools awaits end of CMS stress test • Have achieved unprecedented levels of stress 8-) • SRB / dcache work going on in US; we will follow later • Need to work to understand how SE system fits into this • Make use of short-term MSS interface at RAL (talk today?) Dave Newbold, University of Bristol 16/12/2002

Other points • ‘Horizontal’ approach • Focused ‘task force’ to solve a well-understood (?) problem • Interesting contrast to ‘vertical’ EDG approach • We will see if it works - looks good so far. • Hardware, etc • CERN hardware situation somewhat embarrassing • We have been bailed out by a US institute with ‘spare’ hardware • How do we leverage the UK LCG contribution for resources to attack this kind of problem? (A little goodwill goes a very long way) • EDG situation • Has become very clear that many parts of system do not scale (yet) • Data challenge planning • We are solving the problem for one (two?) experiments • We have no idea what the scaling factor for other DC’s in 03/04 is • New areas will become bottlenecks – so we need to know! Dave Newbold, University of Bristol 16/12/2002

CMS data transfer tests

CMS data transfer tests

Presentation Transcript

Transfer Student Data

Data Transfer Group

CMS Data Transfers

Data Transfer

CMS Data Analysis

CMS Data Analysis

Data Transfer

Data Transfer Group

MC Data transfer

CMS reprocessing tests at STEP09

Reliable Data Transfer

Reliable Data Transfer

Data Transfer

Reliable Data Transfer

SDSS Data Transfer

CMS Microstrip Silicon Tracker System Tests

Reliable Data Transfer

Electrons and Photons at CMS and Tests with DØ Data

CMS reprocessing tests at STEP09

CMS Data Acquisition