220 likes | 302 Views
Proposal: LHC_07 R&D for ATLAS Grid computing. Tetsuro Mashimo International Center for Elementary Particle Physics (ICEPP), The University of Tokyo on behalf of the LHC_07 project team 2013 Joint Workshop of FKPPL and FJPPL(TYL) June 4, 2013 @ Yonsei University, Seoul.
E N D
Proposal: LHC_07R&D for ATLAS Grid computing TetsuroMashimo International Center for Elementary Particle Physics (ICEPP), The University of Tokyo on behalf of the LHC_07 project team 2013 Joint Workshop of FKPPL and FJPPL(TYL) June 4, 2013 @Yonsei University, Seoul
LHC_07“R&D for ATLAS Grid computing” • Cooperation between French and Japanese teams in R&D on ATLAS distributed computing in order to face the important challenges of the next years (in preparation for LHC 14 TeV runs in 2015~) • Important challenges of the next years: new computing model, hardware, software, and networking issues • The International Center for Elementary Particle Physics (ICEPP), the University of Tokyo (WLCG Japanese Tier-2 center) and French Tier-2 centers
LHC_07: members * leader
LHC_07“R&D for ATLAS Grid computing” Successor of the project LHC_02: “ATLAS computing” (year 2006~2012) • The LHC_02 project started as a collaboration between the computer center of the IN2P3 in Lyon, France (Tier-1 center) and the ICEPP Tier-2 center (associated with the Lyon Tier-1 in the ATLAS “cloud” computing model) • LHC_02: Various R&D studies, especially, how to exploit efficiently the available bandwidth of the international long distance network connection
Network between Lyon and Tokyo 10 Gb/s RTT=300 ms GEANT SINET RENATER New York Lyon Tokyo Lyon ASGC (Taiwan) BNL (USA-Long Island) Triumf (Canada-Vancouver) Exploiting the bandwidth is not a trivial thing: packet loss at various places, directional asymmetry in transfer performance, performance change in time, …
Difficulties to find out the source of NW problems iperf May. 28, 2011 LYONTOKYO TOKYOLYON [Mbps] Sites security policy gridftp Projects, VOs computing model Network providers support area May. 28, 2011 LYONTOKYO Large Medium [MB/sec] Small
Case with CC-IN2P3 Misconfiguration on the LBE packet related to QoS Nov. 17, 2011 packet loss cleared Nov. 17, 2011 LYONTOKYO TOKYOLYON Improved by LAN reconfiguration in CC-IN2P3 Jan. 21, 2012
LHC_07“R&D for ATLAS Grid computing” • The LHC_02 project was successful • Continue the collaboration to face the important challenges of the next years: new ATLAS computing model, hardware, software, and networking issues
Implementation of the ATLAS computing model: tiers and clouds • Hierarchical tier organization based on Monarc network topology • Sites are grouped into cloudsfor organizational reasons • Possible communications: • Optical Private Network • T0-T1 • T1-T1 • National networks • Intra-cloud T1-T2 • Restricted communications: General public network • Inter-cloud T1-T2 • Inter-cloud T2-T2
Detector Data Distribution Tier-0 O(2to4GB) files (with exceptions) • RAW and reconstructed data generated at CERN and dispatched at T1s. • Reconstructed data further replicated downstream to T2sof the SAME cloud Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2
Data distribution after Reprocessing and Monte Carlo Reconstruction Tier-0 • RAW data is re-processed at T1s to produce a new version of derived data • Derived data are replicated to T2s of the same cloud • Derived data are replicated to a few other T1s (or CERN) • And, from there, to other T2s of the same cloud O(2to4GB) files (with exceptions) Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2
Monte Carlo production Tier-0 INPUT • Simulation (and some reconstruction) run at T2s • Input data hosted at T1s is transferred (and cached) at T2s • Output data are copied and stored back to T1s • For reconstruction, derived data are • replicated to few other T1s (or CERN) • And, from there, to other T2s of the same cloud OUTPUT Tier-1 Tier-1 Tier-2 Tier-2 Tier-2
Analysis • The paradigm is “jobs go to data” i.e. • Jobs are brokered at sites where data have been pre-placed • Jobs access data only from the local storage of the site where they run • Jobs store the output in the storage the site where they run • No WAN involved. (by Simone Campana, ATLAS TIM, Tokyo, May 2013)
Issues - I • You need data at some T2 (normally “your” T2) • The inputs are at some other T2 in a different cloud • Examples: • Outputs of analysis jobs • Replication of particular samples on demand According to the model you should: Tier-1 Tier-1 Tier-2 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)
Issues - II • You need to process data available only at a give T1 • All sites of that cloud are very busy • You assign jobs to some T2 of a different cloud INPUT According to the model you should: OUTPUT Tier-1 Tier-1 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)
Evolution of the ATLAS computing model • ATLAS decided to relax the “monarch model” • Allow T1-T2 and T2-T2 traffic between different clouds (growth of network bandwidth) • Any site can exchange data with any site if the system believes it is convenient • So far ATLAS asked (large) T2s • To be well connected to their T1 • To be well connected to the T2s of their cloud • Now ATLAS are asking large T2s: • To be well connected to all T1s • To foresee non negligible traffic from/to other (large) T2s
Evolution of the model Multi-Cloud Monte Carlo production Analysis Output Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)
LHC_07: R&D for ATLAS Grid computing • Networking therefore remains as a very important issue • Other topics addressed by the collaboration • Use of virtual machines for operating WLCG services • Improvement of reliability of the middleware for storage • Performance of data access from analysis jobs through various protocols • Investigation of federated Xrootd storage • Optimization and monitoring of data transfer between remote sites
WAN for TOKYO ASGC Pacific Atlantic TRIUMF TOKYO 10Gbps NDGF 40Gbps 10Gbps OSAKA SARA NIKEF LA 40Gbps BNL WIX 14:50 Overview of the SINET 20' 14:50 Overview of the SINET 20' Amsterdam Additional new line (10Gbps) sincethe end of March 2013 20 Gbps RAL CCIN2P3 CERN CANF PIC Geneva LHCONE: New dedicated (virtual) network for Tier-2 centers, etc. “perfSONAR” tool put in place for network monitoring 10 Gbps Dedicated line
Cost of the project • The project uses the existing computing facilities at the Tier-1 and Tier-2 centers in France and Japan and the existing network infrastructure provided by NRENs and GEANT, etc.The cost for hardware is therefore not necessary in this project. • For the communication between the members, e-mails and TV conferences are mainly used, but face-to-face meetings are necessary usually once per year (a small workshop), therefore the cost for travel and stay.