180 likes | 370 Views
LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements. Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe LHCONE Planning Meeting , RENATER Paris, 5 April 2011. Introduction:.
E N D
LHCONE – Linking Tier 1 & Tier 2 SitesBackground and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe LHCONE Planning Meeting , RENATER Paris, 5 April 2011
Introduction: • Describe some of the changes in the computing model of the LHC experiments. • Demonstrate the importance and usage of the network. • Show the relation between LHCONE and LHCOPN. • Bring together and present the user requirements for future LHC physics analysis. • Provide the information to facilitate the presentations on the Architecture and the Implementation of LHCONE.
A Little History • Requirements paper from K. Bos (Atlas) and I. Fisk (CMS) in autumn 2010. • Experiments had devised new compute and data models for LHC data evaluation basically assuming a high speed network connecting the T2s worldwide. • Ideas & proposals were discussed at a workshop held at CERN in Jan 2011. Gave input from the networking community. • An "LHCONE Architecture" doc finalised in Lyon in Feb 2011. • Here K. Bos proposed to start with a prototype based on the commonly agreed architecture. • K. Bos and I. Fisk produced a "Use Case" note with list of sites for the prototype. • In Rome late Feb 2011 some NRENs & DANTE formed ideas for the "LHCONE prototype planning" doc.
LHC: Changing Data Models (1) • LHC computing model based on MONARC served well > 10 years • ATLAS strictly hierarchal; CMS less so. • The successful operation of the LHC accelerator & start of data analysis, brought a re-evaluation of the computing and data models. • Flatter hierarchy: Any site might in the future pull data from any other site hosting it. LHCOPN LHCOPN ArturBarczyk
LHC: Changing Data Models (2) • Data caching: A bit like web caching.Analysis sites will pull datasets from other sites “on demand”, including from Tier2s in other regions, then make it available for others. • Possible strategic pre-placement of data setsDatasets put close to physicists studying that data / suitable CPU power.Use of continental replicas. • Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time. • Traffic patterns are changing – more direct inter-country data transfers
ATLAS Data TransfersBetween all Tier levels • Average: ~ 2.3 GB/s (daily average) • Peak: ~ 7 GB/s (daily average) • Data available on site within a few hours. • 70 Gbit/s on LHCOPN ATLAS reprocessing Daniele Bonacorsi
Data Flow EU – US ATLAS Tier 2’s • Example above is from US Tier 2 sites • Exponential rise in April and May, after LHC start • Changed data distribution model end of June – caching ESD and DESD • Much slower rise since July, even as luminosity grows rapidly KorsBos
LHC: Evolving Traffic Patterns • One example of data coming from the US • 4 Gbit/s for ~ 1.5 days (11 Jan 11) • Transatlantic link • GÉANT Backbone • NREN Access Link • Not an isolated case • Often made up of many data flows • Users getting good at running gridftp
Data Transfers over RENATER • Peak rates a substantial fraction of 10 Gigabits, often for hours. • Several LHC involved. • Demand variable depending on user work. Francois-Xavier Andreu
Data Transfers over DFN Two different weeks from GÉANT to Aachen • Peak rates saturate one of the10 Gigabit links DFN-GÉANT. • Demand variable depending on user work. Christian Grimm
Data Transfers from GARR - CNAFT0-T1 + T1-T1 + T1-T2 • Peak rates 14-18 Gigabit/s. • Traffic shows diurnal demand & is variable depending on user work. • Sustained growth over last year Marco Marletta
CMS Data TransfersData Placement for Physics Analysis • Once data is onto the WLCG, it must be made accessible to analysis applications. • Largest fraction of analysis computing at LHC is at the Tier2s. • New flexibility reduces latency for end users. T1‐T2 dominates T2‐T2 emerges Daniele Bonacorsi
Data Transfer Performance Site or Network? 1 Gbit Bottleneck at receiver • Test NorthGrid to GÉANT PoP London • UDP throughput from SE 990 Mbit/s. • 75% packet loss. • Data transmitted by SE at 3.8 Gbit/sover 4 1 Gigabit interfaces. • TCP transmits in bursts at 3.8 Gbit/spacket loss & re-tries mean low throughput • Even more data with end-hosts fixed. Classic packet loss from bottleneck
LHCOPN linking Tier 0 to Tier 1’sLHCONE for Tier 1’s and Tier 2’s • LHCONE prototype in Europe. • T1 are connected but not LHCOPN Other regions Other regions LHCONE T2s in a country
Requirements for LHCONE • LHCOPN provides infrastructure to move data T0-T1 and T1-T1. • New infrastructure required to improve transfers T1-T2 & T2-T2: • Analysis is mainly done in Tier 2, so data is required from any T1 or any T2. T2-T2 is very important. • Work done at a Tier 2: Simulations & Physics Analysis (50:50) • Network BW needs of a T2 include: • Re-processing efforts: 400 TByte refresh in a week = 5 Gbit/s • Data bursts from user analysis : 25 Tbyte in a day = 2.5Gbit/s • Feeding a 1000 core farm with LHC events: ~ 1Gbit/s • Note this implies timely delivery of data not just average rates! • Access link “available bandwidth” for Tier 2 sizes: • Large 10 Gbit; Medium 5 Gbit; Small 1 Gbit
Requirements for LHCONE • Sites are free to choose the way they wish to connect. • Flexibility & extensibility required: • T2s change • Analysis usage pattern is more chaotic – Dynamic Networks of interest • World-wide connectivity required for LHC sites. • There is concern about LHC traffic swamping other disciplines. • Monitoring & fault-finding support should be built in. • Cost effective solution required – may influence the Architecture. • No isolation of sites must occur. • No interruption of the data-taking or physics analysis • A prototype is needed.
RequirementsFitting in with LHC 2011 data taking • Machine development & Technical Stops provide pauses in the data taking. • This does not mean there is plenty of time. • LHCONE prototype might grow in phases.