80 likes | 229 Views
CMS data transfer tests. Data transfer tests: Focused and short-term programme of work Objective: apply existing knowledge and tools to problem of sustained data throughput for LHC and running experiments Typical goal: CMS DC04. Others exist (will likely run in parallel )
E N D
CMS data transfer tests • Data transfer tests: • Focused and short-term programme of work • Objective: apply existing knowledge and tools to problem of sustained data throughput for LHC and running experiments • Typical goal: CMS DC04. Others exist (will likely run in parallel) • 60Mbps aggregate into/out of MSS • Up to 200Mbps across WAN, data exchange with T0 / T1 peers • Sustained for months, not hours • People: (at least) • Tier-1: M. Bly, A. Sansum, N. White ++ • Net: P. Clarke, R. Hughes-Jones, Y. Li, M. Rio, R. Tasker ++ • CMSUK: T. Barrass, O. Maroney, S. Metson, D. Newbold • CMS: I. Fisk (UCSD), N. Sinanis (CERN), T. Wildish (CERN) ++ Dave Newbold, University of Bristol 16/12/2002
Programme of work • The overall programme (of CMS), next few months • A: infrastructure tests / optimisation (at this stage now in UK) • B: replica management system functional / stress tests (starting now); test (break) a selection of ‘products’ at different layers • Globus RM, mysql-based EDG RM, SRB; dcache, EDG SE, etc • C: large scale deployment of chosen combination with existing data (ties to GridPP milestones); around 20TB to publish • D: use as baseline replica management service for DC04 • Short-term goals, next few weeks • Measure what the real throughput situation is (within UK and between T1s); repeat regular measurements • Attack the problem at bottleneck points, typically through hardware and software improvements at endpoints • Start to deploy selected high-level replica management tools Dave Newbold, University of Bristol 16/12/2002
First results • First attempts at controlled monitoring • Endpoints: RAL, Bristol, CERN • Simple RTT and throughput measurements over 24 hours • Throughput: measured with iperf, 8 streams, up to 256k buffers • Clearly can be done more effectively by deploying ‘real’ monitoring package • ‘Real world’ tests (not in parallel!) • Instrumented existing production data movement tools • Averaged figures for a 1TB data copy (more detail soon) • Disk -> disk, RAL to CERN • Disk -> MSS, RAL • Hardware available: • RAL, Bristol: dedicated machines (which we can tweak) • CERN: ‘fast-path’ but shared server; new hardware coming Dave Newbold, University of Bristol 16/12/2002
Results: RTT Dave Newbold, University of Bristol 16/12/2002
Results: mem-mem copy Dave Newbold, University of Bristol 16/12/2002
Results: ‘real world’ copy • Disk -> disk • csfnfs15.rl.ac.uk -> cmsdsrv08.cern.ch • 1TB dataset transferred from disk -> disk (and thence to castor) • bbcp using 8 streams and CRC check; not CPU or disk limited • 65Mbps aggregate throughput; somewhat ‘lumpy’ • Castor not yet a bottleneck (but did we fill up the stage pool?) • Disk -> tape • csfnfs15.rl.ac.uk -> csfb.rl.ac.uk (NFS mount) -> datastore • Same dataset as above; average file size ~1.5GByte. • Three parallel write processes; volumes created on the fly • 40Mbps aggregate throughput • Try in-order and out-of-order readback next • Reason to believe resources were not ‘shared’ in this case Dave Newbold, University of Bristol 16/12/2002
Upcoming work • Monitoring: • Deploy ‘real’ tools at some stage soon (at many sites) • Hardware / infrastructure • High-spec dedicated server obtained at CERN (US generosity); online soon • HW upgrades / OS and config tweaks at all sites as necessary • Investigate disk and MSS performance at RAL in more detail • “experts working” • Replica management • Installation of latest EDG tools awaits end of CMS stress test • Have achieved unprecedented levels of stress 8-) • SRB / dcache work going on in US; we will follow later • Need to work to understand how SE system fits into this • Make use of short-term MSS interface at RAL (talk today?) Dave Newbold, University of Bristol 16/12/2002
Other points • ‘Horizontal’ approach • Focused ‘task force’ to solve a well-understood (?) problem • Interesting contrast to ‘vertical’ EDG approach • We will see if it works - looks good so far. • Hardware, etc • CERN hardware situation somewhat embarrassing • We have been bailed out by a US institute with ‘spare’ hardware • How do we leverage the UK LCG contribution for resources to attack this kind of problem? (A little goodwill goes a very long way) • EDG situation • Has become very clear that many parts of system do not scale (yet) • Data challenge planning • We are solving the problem for one (two?) experiments • We have no idea what the scaling factor for other DC’s in 03/04 is • New areas will become bottlenecks – so we need to know! Dave Newbold, University of Bristol 16/12/2002