1 / 22

Proposal: LHC_07 R&D for ATLAS Grid computing

Proposal: LHC_07 R&D for ATLAS Grid computing. Tetsuro Mashimo International Center for Elementary Particle Physics (ICEPP), The University of Tokyo on behalf of the LHC_07 project team 2013 Joint Workshop of FKPPL and FJPPL(TYL) June 4, 2013 @ Yonsei University, Seoul.

Download Presentation

Proposal: LHC_07 R&D for ATLAS Grid computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proposal: LHC_07R&D for ATLAS Grid computing TetsuroMashimo International Center for Elementary Particle Physics (ICEPP), The University of Tokyo on behalf of the LHC_07 project team 2013 Joint Workshop of FKPPL and FJPPL(TYL) June 4, 2013 @Yonsei University, Seoul

  2. LHC_07“R&D for ATLAS Grid computing” • Cooperation between French and Japanese teams in R&D on ATLAS distributed computing in order to face the important challenges of the next years (in preparation for LHC 14 TeV runs in 2015~) • Important challenges of the next years: new computing model, hardware, software, and networking issues • The International Center for Elementary Particle Physics (ICEPP), the University of Tokyo (WLCG Japanese Tier-2 center) and French Tier-2 centers

  3. LHC_07: members * leader

  4. LHC_07“R&D for ATLAS Grid computing” Successor of the project LHC_02: “ATLAS computing” (year 2006~2012) • The LHC_02 project started as a collaboration between the computer center of the IN2P3 in Lyon, France (Tier-1 center) and the ICEPP Tier-2 center (associated with the Lyon Tier-1 in the ATLAS “cloud” computing model) • LHC_02: Various R&D studies, especially, how to exploit efficiently the available bandwidth of the international long distance network connection

  5. Network between Lyon and Tokyo 10 Gb/s RTT=300 ms GEANT SINET RENATER New York Lyon Tokyo Lyon ASGC (Taiwan) BNL (USA-Long Island) Triumf (Canada-Vancouver) Exploiting the bandwidth is not a trivial thing: packet loss at various places, directional asymmetry in transfer performance, performance change in time, …

  6. Difficulties to find out the source of NW problems iperf May. 28, 2011 LYONTOKYO TOKYOLYON [Mbps] Sites security policy gridftp Projects, VOs computing model Network providers support area May. 28, 2011 LYONTOKYO Large Medium [MB/sec] Small

  7. Case with CC-IN2P3 Misconfiguration on the LBE packet related to QoS Nov. 17, 2011 packet loss cleared Nov. 17, 2011 LYONTOKYO TOKYOLYON Improved by LAN reconfiguration in CC-IN2P3 Jan. 21, 2012

  8. LHC_07“R&D for ATLAS Grid computing” • The LHC_02 project was successful • Continue the collaboration to face the important challenges of the next years: new ATLAS computing model, hardware, software, and networking issues

  9. ATLAS Computing Model - Tiers

  10. Implementation of the ATLAS computing model: tiers and clouds • Hierarchical tier organization based on Monarc network topology • Sites are grouped into cloudsfor organizational reasons • Possible communications: • Optical Private Network • T0-T1 • T1-T1 • National networks • Intra-cloud T1-T2 • Restricted communications: General public network • Inter-cloud T1-T2 • Inter-cloud T2-T2

  11. Detector Data Distribution Tier-0 O(2to4GB) files (with exceptions) • RAW and reconstructed data generated at CERN and dispatched at T1s. • Reconstructed data further replicated downstream to T2sof the SAME cloud Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

  12. Data distribution after Reprocessing and Monte Carlo Reconstruction Tier-0 • RAW data is re-processed at T1s to produce a new version of derived data • Derived data are replicated to T2s of the same cloud • Derived data are replicated to a few other T1s (or CERN) • And, from there, to other T2s of the same cloud O(2to4GB) files (with exceptions) Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

  13. Monte Carlo production Tier-0 INPUT • Simulation (and some reconstruction) run at T2s • Input data hosted at T1s is transferred (and cached) at T2s • Output data are copied and stored back to T1s • For reconstruction, derived data are • replicated to few other T1s (or CERN) • And, from there, to other T2s of the same cloud OUTPUT Tier-1 Tier-1 Tier-2 Tier-2 Tier-2

  14. Analysis • The paradigm is “jobs go to data” i.e. • Jobs are brokered at sites where data have been pre-placed • Jobs access data only from the local storage of the site where they run • Jobs store the output in the storage the site where they run • No WAN involved. (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

  15. Issues - I • You need data at some T2 (normally “your” T2) • The inputs are at some other T2 in a different cloud • Examples: • Outputs of analysis jobs • Replication of particular samples on demand According to the model you should: Tier-1 Tier-1 Tier-2 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

  16. Issues - II • You need to process data available only at a give T1 • All sites of that cloud are very busy • You assign jobs to some T2 of a different cloud INPUT According to the model you should: OUTPUT Tier-1 Tier-1 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

  17. Evolution of the ATLAS computing model • ATLAS decided to relax the “monarch model” • Allow T1-T2 and T2-T2 traffic between different clouds (growth of network bandwidth) • Any site can exchange data with any site if the system believes it is convenient • So far ATLAS asked (large) T2s • To be well connected to their T1 • To be well connected to the T2s of their cloud • Now ATLAS are asking large T2s: • To be well connected to all T1s • To foresee non negligible traffic from/to other (large) T2s

  18. Evolution of the model Multi-Cloud Monte Carlo production Analysis Output Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

  19. LHC_07: R&D for ATLAS Grid computing • Networking therefore remains as a very important issue • Other topics addressed by the collaboration • Use of virtual machines for operating WLCG services • Improvement of reliability of the middleware for storage • Performance of data access from analysis jobs through various protocols • Investigation of federated Xrootd storage • Optimization and monitoring of data transfer between remote sites

  20. WAN for TOKYO ASGC Pacific Atlantic TRIUMF TOKYO 10Gbps NDGF 40Gbps 10Gbps OSAKA SARA NIKEF LA 40Gbps BNL WIX 14:50 Overview of the SINET 20' 14:50 Overview of the SINET 20' Amsterdam Additional new line (10Gbps) sincethe end of March 2013 20 Gbps RAL CCIN2P3 CERN CANF PIC Geneva LHCONE: New dedicated (virtual) network for Tier-2 centers, etc. “perfSONAR” tool put in place for network monitoring 10 Gbps Dedicated line

  21. Budget plan in the year 2013

  22. Cost of the project • The project uses the existing computing facilities at the Tier-1 and Tier-2 centers in France and Japan and the existing network infrastructure provided by NRENs and GEANT, etc.The cost for hardware is therefore not necessary in this project. • For the communication between the members, e-mails and TV conferences are mainly used, but face-to-face meetings are necessary usually once per year (a small workshop), therefore the cost for travel and stay.

More Related