1 / 32

LHC Computing Grid Project

LHC Computing Grid Project. Les Robertson CERN - IT Division les.robertson@cern.ch. Background. CERN. Data Handling and Computation for Physics Analysis. event filter (selection & reconstruction). detector. processed data. event summary data. raw data. batch

challberg
Download Presentation

LHC Computing Grid Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Computing Grid Project Les Robertson CERN - IT Division les.robertson@cern.ch

  2. Background

  3. CERN Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch

  4. HEP Computing Characteristics • Large numbers of independent events • trivial parallelism • Large data sets • smallish records • mostly read-only • Modest I/O rates • few MB/sec per fast processor • Modest floating point requirement • SPECint performance • Very largeaggregate requirements – computation, data • Scaling up is not just big – it is also complex • …and once you exceed the capabilities of a single geographical installation ………?

  5. History • 1960s thru 1980s • The largest scientific supercomputers & mainframes (Control Data, Cray, IBM, Siemens/Fujitsu) • Time-sharing interactive services on IBM & DEC-VMS • Scientific workstations from 1982 (Apollo) for development, final analysis • 1989-- First batch services on RISC - joint project with HP (Apollo DN10.000 ) • 1990 -- Simulation service - 4 X mainframe capacity • 1991 -- SHIFT - data intensive applications, distributed model • 1993 -- First central interactive service on RISC • 1996 -- Last of the mainframes de-commissioned • 1997 -- First batch services on PCs • 1998 -- NA48 record 70 TeraBytes of data • 2000 -- >75% capacity from PCs

  6. The SHIFT Software Model (1990) all data available to all processes - via an API which can be implemented over IP replicated component model – scalable, heterogeneous, distributed standard APIs – disk I/O; mass storage; job scheduler mass storage model – active data cached on disk (stager) physical implementation transparent to the application/user (implementations on SMPs, SP2, clusters, WAN clusters) flexible evolution – scalable capacity; multiple platforms smooth integration of new technologies

  7. WAN application servers mass storage data cache Generic computing farm

  8. LHC Computing Fabric— Can we scale up the current commodity-component based approach?

  9. Not everything has been commoditised yet

  10. CERN Physics Data Handling Evolution of capacity and cost through the nineties 50% annual growth CPU capacity les.robertson@cern.ch 80% annual growth LEP startup

  11. LHC Offline Computing Scale, Cost and the Model

  12. CERN's Users in the World Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users

  13. On-line System • Multi-level trigger • Filter out background • Reduce data volume • 24 x 7 operation 40 MHz (1000 TB/sec) Level 1 - Special Hardware 75 KHz (75 GB/sec) Level 2 - Embedded Processors 5 KHz(5 GB/sec) Level 3 – Farm of commodity CPUs 100 Hz (100 MB/sec) Data Recording & Offline Analysis

  14. How Much Data is Involved? High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 1 billion people surfing the Web LHCB ATLAS CMS 105 HERA-B KLOE CDF II 104 High Data Archive(PetaByte) CDF 103 H1ZEUS ALICE NA49 UA1 102 104 105 106 107 LEP Event Size (bytes)

  15. The Large Hadron Collider Project 4 detectors CMS ATLAS Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs LHCb

  16. Worldwide distributed computing system • Small fraction of the analysis at CERN • ESD analysis – using 12-20 large regional centres • how to use the resources efficiently • establishing and maintaining a uniform physics environment • Data exchange – with tens of smaller regional centres, universities, labs

  17. Other experiments Other experiments LHC LHC Moore’s law Disk Mass Storage CPU Planned capacity evolution at CERN

  18. IT Division - LTP Planning - Materials Importance of cost containment • components & architecture • utilisation efficiency • maintenance, capacity evolution • personnel & management costs • ease of use (usability efficiency)

  19. CERN – Tier 0 2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps 622 Mbps Uni n Lab a Tier2 Uni b Lab c   Department  Desktop MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html The MONARC Multi-Tier Model (1999) les.robertson@cern.ch

  20. The opportunity of Grid technology Lab m Uni x regional group CERN Tier 1 Uni a UK USA Lab a France Tier 1 Tier3 physics department Uni n CERN Tier2 ………. Italy Desktop Lab b Germany ………. Lab c  Uni y Uni b physics group   LHC Computing Model2001 - evolving The LHC Computing Centre les.robertson@cern.ch

  21. What has to be done

  22. Major Activities • Computing Fabric Management • Networking • Grid Technology • Software • Prototyping & Data Challenges • Deployment • Regional Centre Coordination & Planning

  23. Computing Fabric Management Key Issues – • scale • efficiency & performance • resilience – fault tolerance • cost – acquisition, maintenance, operation • usability • security

  24. single physical cluster – Tier 0, Tier 1, 4 experiments partitioned by function, (maybe) by user an architecture that accommodates mass market components and supports cost-effective and seamless capacity evolution new level of operational automationnovel style of fault tolerance – self-healing fabrics Working assumptions for Computing Fabric at CERN WAN connection to the Grid application servers mass storage data cache Where are the industrial products? • plan for active mass storage (tape) .. but hope to use it onlyas an archive • one platform – Linux, Intel ESSENTIAL to remain flexible on all fronts

  25. Grid Technology • wave of interest in grid technology as a basis for revolutionisinge-Scienceande-Commerce • LHC offers an ideal testbed, and will gain major usability benefits • a win-win situation? • DataGrid & associated national initiatives have placed HEP at the centre of the action in Europe and the US • important to stay mainline, embrace standards and industrial solutions • important to get the DataGrid testbed going nowintegrate successive LHC Grid Prototypesand get to work with Data Challenges driven by the experiments’ needs attack the real not the theoretical problems

  26. (>40) Dubna Lund Moscow Estec KNMI RAL Berlin IPSL Prague Paris Brno CERN Lyon Santander Milano Grenoble PD-LNL Torino Madrid Marseille BO-CNAF HEP sites Pisa Lisboa Barcelona ESRIN ESA sites Roma Valencia Catania DataGrid Testbed Sites Francois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.it

  27. Grid Technology Coordination Significant coordination issuesDataGrid, INFN Grid, GridPP, PPDG, NorduGrid, GriPhyN, CrossGrid, Globus, GGF, Dutch Grid, Hungarian Grid, ……… “InterGrid” Europe-US committee, DataTag, ………… • LHC  HEP regional and national initiatives • HEP national initiatives  national grids • DataGrid • includes earth observation & biology applications but is dominated by HEP • came from a HEPCCC initiative that included the coordination of national HEP grid activities • key role in founding of GGF • close relationship with Globus • The LHC Project should • invest in and support DataGrid and associated projects to deliver the grid technology for the LHC prototype • support GGF for long-term standardisation • and keep a close watch on industry

  28. Software • basic environment – worldwide deployment • libraries, compilers, development tools, webs, portals • common frameworks & tools • simulation, analysis, .. • what is common – 4, 3, 2? • essential that developments are collaborative efforts between the experiments and the labs • need clear requirements from SC2 • and clear guidelines from the POB • adaptation of collaboration software to grid middleware • every “job” must run on 100s of processors at many different sites • anticipate strong pressure for a standard environment in regional centres ---- and the same on the desktop/notebook/palm top

  29. Prototyping & Data Challenges local, network, grid testing usability, performance, reliability operating the Data Challenges driven by the needs of the collaborations Deployment system integration, distribution, maintenance and support grid operation  LHC computing service operation registration, accounting, reporting Regional Centre Coordination fostering common solutions – standards & strategies prototyping planning

  30. Schedule

  31. Time constraints proto 2 proto 1 proto 3 continuing R&D programme prototyping pilot technology selection pilot service system software selection, development, acquisition hardware selection, acquisition 1st production service 2001 2002 2003 2004 2005 2006

More Related