1 / 20

The Promise of Computational Grids in the LHC Era

The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville, Florida, USA avery@phys.ufl.edu http://www.phys.ufl.edu/~avery/ CHEP 2000 Padova, Italy Feb. 7-11, 2000. LHC Computing Challenges. Complexity of LHC environment and resulting data

Download Presentation

The Promise of Computational Grids in the LHC Era

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Promise of Computational Grids in the LHC Era Paul AveryUniversity of FloridaGainesville, Florida, USA avery@phys.ufl.eduhttp://www.phys.ufl.edu/~avery/ CHEP 2000Padova, ItalyFeb. 7-11, 2000 Paul Avery (Data Grids in the LHC Era)

  2. LHC Computing Challenges • Complexity of LHC environment and resulting data • Scale: Petabytes of data per year • Geographical distribution of people and resources Example: CMS 1800 Physicists 150 Institutes 32 Countries Paul Avery (Data Grids in the LHC Era)

  3. Dimensioning / Deploying IT Resources • LHC computing scale is “something new” • Solution requires directed effort, new initiatives • Solution must build on existing foundations • Robust computing at national centers essential • Universities must have resources to maintain intellectual strength, foster training, engage fresh minds • Scarce resources are/will be a fact of life  plan for it • Goal: get new resources, optimize deployment of all resources to maximize effectiveness • CPU: CERN / national lab / region / institution / desktop • Data: CERN / national lab / region / institution / desktop • Networks: International / national / regional / local Paul Avery (Data Grids in the LHC Era)

  4. Deployment Considerations • Proximity of datasets to appropriate IT resources • Massive  CERN & national labs • Data caches  Regional centers • Mini-summary  Institutional • Micro-summary  Desktop • Efficient use of network bandwidth • Local > regional > national > international • Utilizing all intellectual resources • CERN, national labs, universities, remote sites • Scientists, students • Leverage training, education at universities • Follow lead of commercial world • Distributed data, web servers Paul Avery (Data Grids in the LHC Era)

  5. Solution: A Data Grid • Hierarchical grid best deployment option • Hierarchy  Optimal resource layout (MONARC studies) • Grid  Unified system • Arrangement of resources • Tier 0 Central laboratory computing resources (CERN) • Tier 1 National center (Fermilab / BNL) • Tier 2 Regional computing center (university) • Tier 3 University group computing resources • Tier 4 Individual workstation/CPU • We call this arrangement a “Data Grid” to reflect the overwhelming role that data plays in deployment Paul Avery (Data Grids in the LHC Era)

  6. Layout of Resources • Want good “impedance match” between Tiers • TierN-1 serves TierN • TierN big enough to exert influence on TierN-1 • TierN-1 small enough to not duplicate TierN • Resources roughly balanced across Tiers Reasonable balance? Paul Avery (Data Grids in the LHC Era)

  7. 4 4 4 4 Data Grid Hierarchy (Schematic) Tier 0 (CERN) 3 3 3 3 3 T2 T2 3 T2 Tier 1 3 3 T2 T2 3 3 3 3 3 3 Paul Avery (Data Grids in the LHC Era)

  8. 2.4 Gbps Tier 3 Univ WG 1 Tier 1: FNAL/BNL 70k Si95 70 Tbytes Disk; Robot 2.4 Gbps Tier 3 Univ WG 2 Tier 2 Center 20k Si95 25 Tbytes Disk, Robot Optional Air Freight N 622 Mbits/s CERN (CMS/ATLAS) 350k Si95 350 Tbytes Disk; Robot Tier 3 Univ WG M 622Mbits/s 622 Mbits/s 622 Mbits/s US Model Circa 2005 Paul Avery (Data Grids in the LHC Era)

  9. Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS Tier2 Center ~1 TIPS HPSS HPSS HPSS HPSS HPSS Institute ~0.25TIPS Institute Institute Institute Data Grid Hierarchy (CMS) 1 TIPS = 25,000 SpecInt95 PC (today) = 10-20 SpecInt95 ~PBytes/sec Online System ~100 MBytes/sec Offline Farm~20 TIPS Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size ~100 MBytes/sec Tier 0 CERN Computer Center ~622 Mbits/sec Tier 1 Fermilab~4 TIPS France Regional Center Germany Regional Center Italy Regional Center ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Each institute has ~10 physicists workingon one or more channels Data for these channels is cached by the institute server Physics data cache 1-10 Gbits/sec Tier 4 Workstations Paul Avery (Data Grids in the LHC Era)

  10. Why a Data Grid: Physical • Unified system: all computing resources part of grid • Efficient resource use (manage scarcity) • Averages out spikes in usage • Resource discovery / scheduling / coordination truly possible • “The whole is greater than the sum of its parts” • Optimal data distribution and proximity • Labs are close to the data they need • Users are close to the data they need • No data or network bottlenecks • Scalable growth Paul Avery (Data Grids in the LHC Era)

  11. Why a Data Grid: Political • Central lab cannot manage / help 1000s of users • Easier to leverage resources, maintain control, assert priorities regionally • Cleanly separates functionality • Different resource types in different Tiers • Funding complementarity (NSF vs DOE) • Targeted initiatives • New IT resources can be added “naturally” • Additional matching resources at Tier 2 universities • Larger institutes can join, bringing their own resources • Tap into new resources opened by IT “revolution” • Broaden community of scientists and students • Training and education • Vitality of field depends on University / Lab partnership Paul Avery (Data Grids in the LHC Era)

  12. Tier 0 Tier 1 Tier 2 More Organization More Flexibility Tier 3 Tier 4 Tier 2 Regional Centers • Possible Model : CERN:National:Tier 2  1/3 : 1/3 : 1/3 • Complementary role to Tier 1 lab-based centers • Less need for 24  7 operation  lower component costs • Less production-oriented respond to analysis priorities • Flexible organization, i.e. by physics goals, subdetectors • Variable fraction of resources available to outside users • Range of activities includes • Reconstruction, simulation, physics analyses • Data caches / mirrors to support analyses • Production in support of parent Tier 1 • Grid R&D • ... Paul Avery (Data Grids in the LHC Era)

  13. Distribution of Tier 2 Centers • Tier 2 centers arranged regionally in US model • Good networking connections to move data (caches) • Location independence of users always maintained • Increases collaborative possibilities • Emphasis on training, involvement of students • High quality desktop environment for remote collaboration, e.g., next generation VRVS system Paul Avery (Data Grids in the LHC Era)

  14. Strawman Tier 2 Architecture • Linux Farm of 128 Nodes $ 0.30 M • Sun Data Server with RAID Array $ 0.10 M • Tape Library $ 0.04 M • LAN Switch $ 0.06 M • Collaborative Infrastructure $ 0.05 M • Installation and Infrastructure $ 0.05 M • Net Connect to Abilene network $ 0.14 M • Tape Media and Consumables $ 0.04 M • Staff (Ops and System Support) $ 0.20 M* • Total Estimated Cost (First Year)$ 0.98 M • Cost in Succeeding Years, for evolution, $ 0.68 Mupgrade and ops: * 1.5 – 2 FTE support required per Tier 2. Physicists from institute also aid in support. Paul Avery (Data Grids in the LHC Era)

  15. Strawman Tier 2 Evolution 2000 2005 • Linux Farm: 1,500 SI95 20,000 SI95* • Disks on CPUs 4 TB 20 TB • RAID Array 1 TB 20 TB • Tape Library 1 TB 50 - 100 TB • LAN Speed 0.1 - 1 Gbps 10 - 100 Gbps • WAN Speed 155 - 622 Mbps 2.5 - 10 Gbps • Collaborative MPEG2 VGA Realtime HDTVInfrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)  RAID disk used for “higher availability” data * Reflects lower Tier 2 component costs due to less demanding usage, e.g. simulation. Paul Avery (Data Grids in the LHC Era)

  16. The GriPhyN Project • Joint project involving • US-CMS, US-ATLAS • LIGO Gravity wave experiment • SDSS Sloan Digital Sky Survey • http://www.phys.ufl.edu/~avery/mre/ • Requesting funds from NSF to build world’s first production-scale grid(s) • Sub-implementations for each experiment • NSF pays for Tier 2 centers, some R&D, some networking • Realization of unified Grid system requires research • Many common problems for different implementations • Requires partnership with CS professionals Paul Avery (Data Grids in the LHC Era)

  17. R & D Foundations I • Globus (Grid middleware) • Grid-wide services • Security • Condor (see M. Livny paper) • General language for service seekers / service providers • Resource discovery • Resource scheduling, coordination, (co)allocation • GIOD (Networked object databases) • Nile (Fault-tolerant distributed computing) • Java-based toolkit, running on CLEO Paul Avery (Data Grids in the LHC Era)

  18. R & D Foundations II • MONARC • Construct and validate architectures • Identify important design parameters • Simulate extremely complex, dynamic system • PPDG (Particle Physics Data Grid) • DOE / NGI funded for 1 year • Testbed systems • Later program of work incorporated into GriPhyN Paul Avery (Data Grids in the LHC Era)

  19. The NSF ITR Initiative • Information Technology Research Program • Aimed at funding innovative research in IT • $90M in funds authorized • Max of $12.5M for a single proposal (5 years) • Requires extensive student support • GriPhyN submitted preproposal Dec. 30, 1999 • Intend that ITR fund most of our Grid research program • Major costs for people, esp. students / postdocs • Minimal equipment • Some networking • Full proposal due April 17, 2000 Paul Avery (Data Grids in the LHC Era)

  20. Summary of Data Grids and the LHC • Develop integrated distributed system, while meeting LHC goals • ATLAS/CMS: production, data handling oriented • (LIGO/SDSS: computation, “commodity component” oriented) • Build, test the regional center hierarchy • Tier 2 / Tier 1 partnership • Commission and test software, data handling systems, and data analysis strategies • Build, test the enabling collaborative infrastructure • Focal points for student-faculty interaction in each region • Realtime high-res video as part of collaborative environment • Involve students at universities in building the data analysis, and in the physics discoveries at the LHC Paul Avery (Data Grids in the LHC Era)

More Related