100 likes | 234 Views
AGLT2 and MWT2 Networking Requirements. Rob Gardner Computation and Enrico Fermi Institutes University of Chicago AGLT2-MWT2 Networking Meeting June 3, 2013. Scale of resources. MWT2 capacity 6392 compute cores 3614 TB AGLT2 capacity 4756 job slots 3510 TB
E N D
AGLT2 and MWT2 Networking Requirements Rob Gardner Computation and Enrico Fermi Institutes University of Chicago AGLT2-MWT2 Networking Meeting June 3, 2013
Scale of resources • MWT2 capacity • 6392 compute cores • 3614 TB • AGLT2 capacity • 4756 job slots • 3510 TB • This summer more compute nodes will be added • Next 3 years more CPU and storage will be added according to ATLAS physics needs • Resources to upgrade connectivity to 100 Gbps approved
MWT2, from 5-year proposal. Capacities adjust torequirements, adjustedannually CPU capacity in SpecINT 2006 Job slots in 2015 > 10K (original plan; expect toreach this in 2014) BW/job slot ~ 10-500 Mbpsdepending on workload Difficult to model resultingnetwork load
Evolving computing model & federation • Over the past few years the rigid T0/T1/T2/T3 hierarchy changing into a flatter mesh-like infrastructure • Made possible by faster, reliable and affordable networks; new virtual peering projects such as LHCONE • Offers opportunity to create new ways of accessing data from production and analysis jobs • E.g. remove restriction that CPU and data located at same site • Tools to allow jobs to flexibly reach datasets beyond the local site storage are being developed
A file is searchedlocally using theunique global name If not found at the site the search is expandedto the region usinga network of redirectors File may be copied tothe local storage, orread directly over theWAN Network latency requires intelligent cachingby the clientif file is directly read control data
Federation traffic Modest levels now will grow when in production 700 MB/s Early Tier 3 users + measurement jobs
Comparing local to wide area performance local Ping time (ms) read time (s) local
Types of ATLAS network traffic (I) • Intra-Tier2 • Both AGLT2 and MWT2 are multi-site federations • Datasets resident at each site • Jobs at one site can read at another • Three transfer modes have been used: • Direct read access (file opened over the network) • File copied from the remote site to local worker node scratch disk • File read triggers a pool-to-pool replication to a local cache
Types of ATLAS network traffic (II) • Tier1-Tier2 • Replication of input data sets • Output of production datasets • Tier2-Tier2 • Driven by managed production tasks • (Tier3, OSG, Cloud, HPC)-Tier2 • Expect this mode to grow • As well as other resources – such as from Campus Grids, OSG, Cloud and HPC centers
AGLT2-MWT2 • Creation of low latency multi-federation • 20k jobs reading 15000 TB