1 / 17

LHC Network Issues

LHC Network Issues. Joint Tech Meeting February 14 , 2007 Ian Fisk. Large Hadron Collider. The LHC is in the final year of construction at CERN in Geneva, Switzerland. LHC Schedule. LHC will have a pilot run in December of 2007

alize
Download Presentation

LHC Network Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Network Issues Joint Tech Meeting February 14 , 2007 Ian Fisk

  2. Large Hadron Collider • The LHC is in the final year of construction at CERN in Geneva, Switzerland

  3. LHC Schedule • LHC will have a pilot run in December of 2007 • Beam energy will be only the injector energy 900GeV (~1/15th of the design energy) • Machine luminosity will be low, but data volumes may be high for the run as detectors try to take as much calibration data as possible • There is no discovery potential in the pilot run, but the data will be critical for calibration and preparations • High Energy running begins in the early summer of 2007 • 14TeV beams for the remainder of the year • Enormous commissioning work in the presence of the new energy frontier • Luminosity and data volume increases in 2009 and 2010 • Before the pilot run both ATLAS and CMS have preparation challenges • ATLAS has a “dress rehearsal” starting in the summer • CMS has a 50% computing challenge for offline computing

  4. Data Volume and Distribution • Each detector is capable of producing a raw data sample of a 2PB-3PB in a nominal year of data taking • A similar sized simulation sample of simulated events will be produced at 25-50 Tier-2 computing centers • The collaborations are the largest ever attempted with ~2000 scientists each • Large potential dynamic samples of data for analysis needs to get into a lot of hands in a lot of places • Only ~20% of the computing capacity is located at CERN • The detector and the distributed computing facility need to be commissioned concurrently • Nearly all running high energy physics experiments have some distributed computing (some even have reached a majority offsite) • Most started with a large local dedicated computing center for the start

  5. Tier-1 to Tier-1 Burst with Rereco Load Balancing Tier-0 to Tier-1 Flow Predictable High Priority Tier-1 to Tier-2 Burst with User Needs (From 1 Tier-1 in case of ATLAS) Tier-2 to Tier-1 Predictable Simulation Data • Both ATLAS and CMS chose distributed computing models from early on • Variety of motivating factors (infrastructure, funding, leverage) Tier-0 Tier-1 Tier-1 Tier-1 Tier-1 Tier-0 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

  6. Responsibilities of the Tiers • Tier-0 • Primary reconstruction • Partial Reprocessing • First archive copy of the raw data • Tier-1s • Share of raw data for custodial storage • Data Reprocessing • Analysis Tasks • Data Serving to Tier-2 centers for analysis • Archive Simulation From Tier-2 • Tier-2s • Monte Carlo Production • Primary Analysis Facilities

  7. Network Estimates • From the CMS Computing Model (ATLAS is similar though slightly higher): • The network requirements for Tier-0 to Tier-1 transfers are driven by the trigger rate and the event size • Estimates are ~2.5Gb for a nominal Tier-1 center • The Tier-1 event share with a factor of 2 recovery factor and a factor of 2 provisioning factor • The Tier-1 to Tier-1 transfers are driven by the desire to synchronize re-reconstruction samples within a short period of time • To replicate the newly created reconstructed and AOD between Tier-1 centers in a week is 1Gb/s, before the safety and provisioning factors • The Tier-1 to Tier-2 transfers are less predictable • Driven by user activities. • CMS model estimates this at 50-500MB/s (Includes safety factors) • Tier-2 to Tier-1 transfers are predictable in rate and low in volume • Averaged over the entire it’s ~1TB per day.

  8. CERN to Tier-1 Connectivity • Connectivity from CERN to Tier-1 centers provided by the OPN • Also provides Tier-1 to Tier-1 Connectivity

  9. Tier-1 to Tier-2 Connectivity • In order to satisfy their mission as a primary resource for experiment analysis the Tier-2 need good connectivity to the Tier-1 centers • Data is served from Tier-1 computing centers. • In CMS Each Tier-1 is assigned a share • In ATLAS the Tier-1s have a complete analysis • The connectivity between the Tier-1 and Tier-2 centers can be substantially higher than the Tier-0 to Tier-1 rates • Already in the computing challenge the incoming rate to FNAL is half the outgoing rate to Tier-2 centers • The network that carries the Tier-2 traffic is going to be instrumental to the experiment’s success. • The Tier-2 traffic is a more difficult networking problem • The number of connections is large • There are a diverse set of locations and setups

  10. Tier-2 and Tier-3 Centers • A Tier-2 center in ATLAS and CMS are approximately 1MSI2k of computing • Tier-3 centers belong to university groups and can be of comparable size • A Tier-2 center in ATLAS and CMS ~200TB of disk • Currently procuring and managing this volume of storage is expensive and operationally challenging • Requires a reasonably virtualization layer • A Tier-2 center has between 2.5 Gb/s and 10Gb/s of connectivity in the US • This is similar between Tier-2 and Tier-3 centers • The speed of connection to the local sites has increased rapidly • In the US-CMS planning a Tier-2 supports 40 Physicists performing analysis • This is a primary difference between a Tier-2 and a Tier-3 • Tier-2 centers are a funded effort of the experiment • The central project has expectations of them

  11. Surviving the first years • The computing for either experiment is hardest as the detector is being understood • The analysis object data for CMS is estimated at 0.05MB • An entire year’s data and reconstruction are only 300TB • Data is divided into ~10 trigger streams and ~50 offline streams • A physics analysis should rely on 1 trigger stream • A Tier-2 could potentially maintain all the analysis objects for the majority of the analysis streams • Unfortunately, until the detector and reconstruction are completely understood the AOD is not useful for most analysis and access to the raw data will be more frequent • The full raw data is 35 times bigger • Given the experience of the previous generation of detectors, we should expect about 3 years to stabilize • People working at Tier2 centers can make substantial, but bursty requirements of the data transfers

  12. Analysis Selections • When going back to the raw data and complete simulation, analysis selections on a complete trigger streams • 1% selection on data and MC would be 4TB, 10% selection would be 40TB • Smaller by factor of 5 if only the offline stream can be used • There are an estimated 40 people working at a Tier-2 • If half the people perform the small selections at the level of twice a month • This is already 50MB/s on average and everyone is working asynchronously • The original analysis estimates were once a week • 10% selections will happen • 100MB/s x 7 Tier-2s would be ~6Gb/s from a Tier-1 • Size of selections, number of active people and frequency of selections all have significant impact on the total network requirements • Can easily arrive at 500MB/s for bursts.

  13. Tier-3 Connectivity • Tier-2s are a resource for the physics community. • Even people with significant university clusters at home have the opportunities to use Tier-2 • The use of Tier-3s for analysis is foreseen in the model • These are not resources for the whole experiment and can have lower priority for access to common resources • The number of active physicist supported at a Tier-3 center is potentially much smaller than a Tier-2 • 4-8 people • This leads to smaller sustained network use • but similar requirements to T2s to enable similar turn-around times/latencies for physics datasets copied to T3 sites for analysis • For CMS, the Tier-3s are similar in analysis functionality but not in capacity • Data Connectivity is expected to all Tier-1 sites

  14. Tier-2 and Tier-3 Connectivity to Sites • The network from CERN to Tier-1s is a dedicated resource • The Tier-2 and Tier-3 connections to Tier-1 centers are expected to go over the normal backbones (Internet2, ESNET) • In the ATLAS model the Tier-2 centers connect primarily to one Tier-1. In the case of the US centers, this is Brookhaven • In the CMS model, the Tier-2 centers connect to the Tier-1 hosting the data. • About 50% of the data will be resident at FNAL • The other 50% is hosted in 5 European Tier-1 and 1 Asian Tier-2 • The European Tier-2s will also connect to FNAL for roughly half the connections • A number of the US-Tier-2s either own connectivity to StarLight or participate in projects like UltraLight • Connections to Europe will be over shared resources and peerings with european providers

  15. Examples of Transfers • Tier-1 to Tier-2 transfers within the US with SRM disk-to-disk

  16. Examples of Transfers (Cont.) • Transfers from FNAL to European Tier-2s

  17. Outlook • Volume of data and the level of distribution presents new challenges for the LHC computing • Distributed computing is an integral part of the experiment’s success • Making efficient use of the network to move large quantities of data is critical to the success of distributed computing • The Tier-1 centers have custodial responsibility for raw data • They are a natural extension of the online system and the network rates are predictable. • The network for CERN to Tier-1 is dedicated • The Tier-2 centers are resources for analysis • Both experiments are learning to perform efficient data transfers • Transfers are driven by user needs and demands can be high • A lot of work to do in the last year

More Related