190 likes | 453 Views
Grid computing for CMS. Alain Romeyer (Mons - Belgium). What is the Grid ? Let’s start with an analogy How it works ? (Some basic ideas) Grid for LHC and CMS computing model Conclusion . What is the Grid ?. an integrated advanced cyber infrastructure that delivers: Computing capacity
E N D
Grid computing for CMS Alain Romeyer (Mons - Belgium) • What is the Grid ? • Let’s start with an analogy • How it works ? (Some basic ideas) • Grid for LHC and CMS computing model • Conclusion Alain Romeyer - Dec. 2004
What is the Grid ? • an integrated advanced cyber infrastructure that delivers: Computing capacity Data capacity Communication capacity • Coordinated resource sharing and problem solving in dynamic • no centralized control • Use standard and open protocols and interfaces • deliver nontrivial qualities of service • What is not a Grid? • A cluster, a network attached storage device, a scientific instrument, a network, etc. • Each may be an important component of a Grid, but by itself does not constitute a Grid For us : A new way of doing science !!! Alain Romeyer - Dec. 2004
An analogy : Power electricity (on demand access) Quality, economies of scale Time Alain Romeyer - Dec. 2004
By analogy • Decouple production and consumption • Enable on-demand access • Achieve economies of scale • Enhance consumer flexibility • Enable new device • On a variety of scales • Department • Campus • Enterprise • Internet Alain Romeyer - Dec. 2004
Not a perfect analogy… • I import electricity but must export data • “Computing” is not interchangeable but highly heterogeneous • Computers, data, sensors, services, … • So the story is more complicated • But more significantly, the sum can be greater than the parts • Dynamic allocation of resources • Access to distributed services • Virtualization & distributed service management Alain Romeyer - Dec. 2004
How it works ? Grid responsibilities • Security Infrastructure • Authentication (identity) • authorization (rights) • Management : • Information Management • Soft-state, registration, discovery, selection, monitoring • Resource Management • Remote service invocation, reservation, allocation • Resource specification • Data Management • High-performance, remote data access • Cataloguing, replication, staging Alain Romeyer - Dec. 2004
How it works ? Security - Authentification • Grid Security Infrastructure (GSI) • Public key infrastructure (asymmetric) • Need to be associated to a Virtual Organisation (VO) • Need certificate delivered by a Certification Authority (CA) • A certificate (x509 international standard) is : • It contains : • A subject name (identify the user/person) • A user public key • The identity of the CA • The digital signature of the CA a digitally signed document attesting to the binding of a public key to an individual entity Alain Romeyer - Dec. 2004
Certificate Request hash Py75c%bn Cert signing Encrypt registration PublicCertificate 3kjfgf*£$& Digital Signature How it works ? Security - Authentication CA Message Digest VO Alain Romeyer - Dec. 2004
Global Manager ResourceLocationService Computing Element NetworkServer Request (JDL) Where ? LRMS WorkloadManager • Best actions to satisfy the request : • match-making • where submit • Grid status • Decision Job submission Status ? Data Transfert Publish characs,status, available services… Storage Element Job controlCONDOR-G LRMS InformationService How it works ? Management End of job :outputs are stored in your « sand box » ask to download them Alain Romeyer - Dec. 2004
Some Grid e-science projects Sloan Digital Sky Survey LHC LHCb Atlas Alice ALMA CMS Alain Romeyer - Dec. 2004
EGEE (www.eu-egee.org) • Enabling Grid for E-science in Europe (2 years project) • Funded by the EU, 3 core areas : • 1) build a consistent, robust and secure Grid network that will attract additional computing resources. • 2) continuously improve and maintain the middleware in order to deliver a reliable service to users. • 3) attract new users from industry as well as science and ensure they receive the high standard of training and support they need. • Two pilot application selected : • Biomedical Grids (bioinformatics and healthcare data) • Large Hadron Collider Computing Grid (LCG) Alain Romeyer - Dec. 2004
LHC Computing Grid (LCG) Phase I (2002 - 2005) : development phase + series of computing data challenges Phase II (2006 – 2008) : real production and deployment phase 2 phase project • LCG goal : prepare the computing infrastructure for the simulation, processing and analysis of LHC data for the 4 experiments. • 6 000 physicist working together • 12-14 PetaBytes of data will be generated each year (20 millions CDs == 20 km) • Analysing this will require the equivalent of 70,000 of today's fastest PC processors(~192 years) Alain Romeyer - Dec. 2004
LCG status 22/09/2004 Total Sites : 82 Total CPUs : 7269Total Storage : 6558 (TB) Alain Romeyer - Dec. 2004
CMS data production at LHC Level 1 Trigger 75 KHz (50 GB/sec) High Level Trigger 100 Hz (100 MB/sec) Cluster for the Trigger ~ 1000 – 2000 PCs Data Recording & Offline Analysis p 1 bunch crossing Every 25 ns p 40 MHz (1000 TB/sec) Alain Romeyer - Dec. 2004
CMS computing model Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier 2 ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Physicists work on analysis “channels”. data for these channels should be cached by the institute server Physics data cache 0.1 to 10 Gbps Workstations Tier 4 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 2.5-10 Gbps Alain Romeyer - Dec. 2004
DC04 Data Challenge FNAL Chicago RAL Oxford T1 T1 FZK Karlsruhe T1 T1 T1 CNAF Bologna • T1 centres in DC04 • Pull data from T0 to T1 and store • Make data available to PRS • Demonstrate quasi-realtime “fake” analysis T1 IN2P3 Lyon PIC Barcelona March-April 2004 • T0 at CERN in DC04 • 25 Hz input event rate • Reconstruct quasi-realtime • Events filtered into streams • Distribute data to T1’s T0 Alain Romeyer - Dec. 2004
DC04 Processing Rate T0 events processed vs. days T0 event processing rate (Hz) • Got above 25Hz on many short occasions • Only one full day >25Hz with full system • Processed about 30M events • DC04 demonstrated that the system can work…at least for well controlled data flow / analysis, and for a few expert users Next challenge: make it useable by average physicists…and demonstrate that the performance scales acceptably Alain Romeyer - Dec. 2004
Conclusion • Grid becomes a reality • Management is the crucial issue that is not fully implemented • will be done by the EGEE project • For the HEP, LCG II already available and working • CMS DC04 has showed that the system starts to work • Next data challenge will be crucial : • Usable by standard physicist • Performances reasonable for LHC Alain Romeyer - Dec. 2004
Conclusion • Belgrid project (www.belgrid.be) « a Belgian Grid initiative « • Regroups academic, public and private partners • Goal : share the local computing resources using Grid technologies • Status : GridFTP between sites is working • Plan : distributed computing • BEgrid (belnet) : grid computing for the Belgian Research • Belnet : official CA -> certificate also valid for use in EGEE • 5 universities connected (KULeuven, UA, UG, ULB and VUB) • LCG II and follow the EGEE middleware Alain Romeyer - Dec. 2004