320 likes | 331 Views
Explore the history, characteristics, and evolution of CERN computing, from mainframes to the modern LHC Computing Grid. Learn about data handling, processing physics analysis, event simulation, and more.
E N D
LHC Computing Grid Project Les Robertson CERN - IT Division les.robertson@cern.ch
CERN Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch
HEP Computing Characteristics • Large numbers of independent events • trivial parallelism • Large data sets • smallish records • mostly read-only • Modest I/O rates • few MB/sec per fast processor • Modest floating point requirement • SPECint performance • Very largeaggregate requirements – computation, data • Scaling up is not just big – it is also complex • …and once you exceed the capabilities of a single geographical installation ………?
History • 1960s thru 1980s • The largest scientific supercomputers & mainframes (Control Data, Cray, IBM, Siemens/Fujitsu) • Time-sharing interactive services on IBM & DEC-VMS • Scientific workstations from 1982 (Apollo) for development, final analysis • 1989-- First batch services on RISC - joint project with HP (Apollo DN10.000 ) • 1990 -- Simulation service - 4 X mainframe capacity • 1991 -- SHIFT - data intensive applications, distributed model • 1993 -- First central interactive service on RISC • 1996 -- Last of the mainframes de-commissioned • 1997 -- First batch services on PCs • 1998 -- NA48 record 70 TeraBytes of data • 2000 -- >75% capacity from PCs
The SHIFT Software Model (1990) all data available to all processes - via an API which can be implemented over IP replicated component model – scalable, heterogeneous, distributed standard APIs – disk I/O; mass storage; job scheduler mass storage model – active data cached on disk (stager) physical implementation transparent to the application/user (implementations on SMPs, SP2, clusters, WAN clusters) flexible evolution – scalable capacity; multiple platforms smooth integration of new technologies
WAN application servers mass storage data cache Generic computing farm
LHC Computing Fabric— Can we scale up the current commodity-component based approach?
CERN Physics Data Handling Evolution of capacity and cost through the nineties 50% annual growth CPU capacity les.robertson@cern.ch 80% annual growth LEP startup
LHC Offline Computing Scale, Cost and the Model
CERN's Users in the World Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users
On-line System • Multi-level trigger • Filter out background • Reduce data volume • 24 x 7 operation 40 MHz (1000 TB/sec) Level 1 - Special Hardware 75 KHz (75 GB/sec) Level 2 - Embedded Processors 5 KHz(5 GB/sec) Level 3 – Farm of commodity CPUs 100 Hz (100 MB/sec) Data Recording & Offline Analysis
How Much Data is Involved? High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 1 billion people surfing the Web LHCB ATLAS CMS 105 HERA-B KLOE CDF II 104 High Data Archive(PetaByte) CDF 103 H1ZEUS ALICE NA49 UA1 102 104 105 106 107 LEP Event Size (bytes)
The Large Hadron Collider Project 4 detectors CMS ATLAS Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs LHCb
Worldwide distributed computing system • Small fraction of the analysis at CERN • ESD analysis – using 12-20 large regional centres • how to use the resources efficiently • establishing and maintaining a uniform physics environment • Data exchange – with tens of smaller regional centres, universities, labs
Other experiments Other experiments LHC LHC Moore’s law Disk Mass Storage CPU Planned capacity evolution at CERN
IT Division - LTP Planning - Materials Importance of cost containment • components & architecture • utilisation efficiency • maintenance, capacity evolution • personnel & management costs • ease of use (usability efficiency)
CERN – Tier 0 2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps 622 Mbps Uni n Lab a Tier2 Uni b Lab c Department Desktop MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html The MONARC Multi-Tier Model (1999) les.robertson@cern.ch
The opportunity of Grid technology Lab m Uni x regional group CERN Tier 1 Uni a UK USA Lab a France Tier 1 Tier3 physics department Uni n CERN Tier2 ………. Italy Desktop Lab b Germany ………. Lab c Uni y Uni b physics group LHC Computing Model2001 - evolving The LHC Computing Centre les.robertson@cern.ch
Major Activities • Computing Fabric Management • Networking • Grid Technology • Software • Prototyping & Data Challenges • Deployment • Regional Centre Coordination & Planning
Computing Fabric Management Key Issues – • scale • efficiency & performance • resilience – fault tolerance • cost – acquisition, maintenance, operation • usability • security
single physical cluster – Tier 0, Tier 1, 4 experiments partitioned by function, (maybe) by user an architecture that accommodates mass market components and supports cost-effective and seamless capacity evolution new level of operational automationnovel style of fault tolerance – self-healing fabrics Working assumptions for Computing Fabric at CERN WAN connection to the Grid application servers mass storage data cache Where are the industrial products? • plan for active mass storage (tape) .. but hope to use it onlyas an archive • one platform – Linux, Intel ESSENTIAL to remain flexible on all fronts
Grid Technology • wave of interest in grid technology as a basis for revolutionisinge-Scienceande-Commerce • LHC offers an ideal testbed, and will gain major usability benefits • a win-win situation? • DataGrid & associated national initiatives have placed HEP at the centre of the action in Europe and the US • important to stay mainline, embrace standards and industrial solutions • important to get the DataGrid testbed going nowintegrate successive LHC Grid Prototypesand get to work with Data Challenges driven by the experiments’ needs attack the real not the theoretical problems
(>40) Dubna Lund Moscow Estec KNMI RAL Berlin IPSL Prague Paris Brno CERN Lyon Santander Milano Grenoble PD-LNL Torino Madrid Marseille BO-CNAF HEP sites Pisa Lisboa Barcelona ESRIN ESA sites Roma Valencia Catania DataGrid Testbed Sites Francois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.it
Grid Technology Coordination Significant coordination issuesDataGrid, INFN Grid, GridPP, PPDG, NorduGrid, GriPhyN, CrossGrid, Globus, GGF, Dutch Grid, Hungarian Grid, ……… “InterGrid” Europe-US committee, DataTag, ………… • LHC HEP regional and national initiatives • HEP national initiatives national grids • DataGrid • includes earth observation & biology applications but is dominated by HEP • came from a HEPCCC initiative that included the coordination of national HEP grid activities • key role in founding of GGF • close relationship with Globus • The LHC Project should • invest in and support DataGrid and associated projects to deliver the grid technology for the LHC prototype • support GGF for long-term standardisation • and keep a close watch on industry
Software • basic environment – worldwide deployment • libraries, compilers, development tools, webs, portals • common frameworks & tools • simulation, analysis, .. • what is common – 4, 3, 2? • essential that developments are collaborative efforts between the experiments and the labs • need clear requirements from SC2 • and clear guidelines from the POB • adaptation of collaboration software to grid middleware • every “job” must run on 100s of processors at many different sites • anticipate strong pressure for a standard environment in regional centres ---- and the same on the desktop/notebook/palm top
Prototyping & Data Challenges local, network, grid testing usability, performance, reliability operating the Data Challenges driven by the needs of the collaborations Deployment system integration, distribution, maintenance and support grid operation LHC computing service operation registration, accounting, reporting Regional Centre Coordination fostering common solutions – standards & strategies prototyping planning
Time constraints proto 2 proto 1 proto 3 continuing R&D programme prototyping pilot technology selection pilot service system software selection, development, acquisition hardware selection, acquisition 1st production service 2001 2002 2003 2004 2005 2006