300 likes | 324 Views
Learn about the computational challenges faced by physicists in the CMS experiment, and how CMS computing enables them to analyze a very large data volume. Gain an understanding of data sets, types, volumes, resource requirements, and the user analysis tool CRAB.
E N D
Computing Basics within CMS • J-Term at LPC • 01 / 12 / 06
Introduction • CMS: • Physics challenge • Experimental challenge (detector, machine) • But also: computational challenge • CMS computing • Goal: enable physicists to analyze CMS data • Task: to provide analysis access to a very large data volume • “Analysis of CMS data will be different to what we know from previous HEP experiments” • This talk is meant to give a brief overview about CMS computing from the user perspective 01/12/06 Oliver Gutsche - Computing Basics within CMS
Outline • CMS, the computational challenge • Data sets • Data types and volumes • Resource requirements • The GRID and the Tiers • The CMS Tier structure • USCMS contribution • How is the user going to analyze CMS data? • The user analysis tool CRAB ? 01/12/06 Oliver Gutsche - Computing Basics within CMS
CMS number of physicists total ~2000 active doing analysis ~1000 Analysis from the user perspective • Approach of user analysis: • Compare measurement of physics quantities to theoretical prediction • Physics quantities are not measured directly • Reconstruct detector measurement • Use Monte Carlo simulations (MC) for efficiency determinations, etc. • User: • User code: • Combine reconstructed quantities to new objects • Reconstruct new quantities or re-reconstruct quantities • Input sources: Data and MC User (Me) 01/12/06 Oliver Gutsche - Computing Basics within CMS
CMS computing • Provide all CMS users with • Access to reconstructed data • Access to simulated and reconstructed MC • Access to sufficient computing power to execute user code • The computational infrastructure should be available • Independent of the location of the user (international collaboration) • On a fair share basis 01/12/06 Oliver Gutsche - Computing Basics within CMS
DataSets • “Analyses rarely make use of more than a well defined number of trigger channels” • Split data coming from trigger (HLT) on reconstruction farm (RECO) into ~50 primary datasets • Typical analysis needs only to access one dataset • Overlap between datasets ~10% primary datasets HLT RECO 01/12/06 Oliver Gutsche - Computing Basics within CMS
CERN • First approach • Concentrate computing at origin of data • Provide network access to computing infrastructure • In the following: overview of required computing resources (order of magnitude) in terms of • Data volume (mass storage) • Reconstruction resources (CPU) • Simulation resources (CPU) • Analysis resources (CPU and mass storage) 01/12/06 Oliver Gutsche - Computing Basics within CMS
data_tier year event size beam time total size in 2007 number of events total size in 2008 2007 2.5 106 s 3.75 108 RAW 1.50 MB 0.50 PB 2.10 PB 2008 107 s 1.5 109 Reco 0.25 MB 0.09 PB 0.35 PB AOD 0.05 MB 0.02 PB 0.07 PB total 1.8 MB 0.6 PB 2.5 PB Data volume • Detector output rate after trigger: 150 Hz for all running conditions (low luminosity, high luminosity) • Extrapolated beam time: • Event data_tiers and sizes: • Remarks: • AOD: Analysis Object Data • PB: PetaByte = 1024 TeraByte = 1024*1024 GigaByte = 1024*1024*1024* MegaByte 01/12/06 Oliver Gutsche - Computing Basics within CMS
year data_tier event size total data volume total size in 2007 DVD stack height total size in 2008 SimEvent 2.0 MB 0.70 PB 2.79 PB 2007 1.44 PB 0.5 km RecoSimEvent 0.4 MB 0.14 PB 0.56 PB 2008 5.85 PB 1.8 km total 2.4 MB 0.84 PB 3.35 PB Simulation data volume • Simulated event size • Assumption: simulate the same number of events as recorded from the detector • Total sample size: • Stack of 50 DVD: height 6 cm 2008 1.8 km 2007 0.5 km 01/12/06 Oliver Gutsche - Computing Basics within CMS
CPU requirements • All requirements on computing power (number of CPU’s) are given in SI2K • Assume CPU in 2007: 4000 SI2K (optimistic) • Pentium IV 3 GHz ≈ 1300 SI2K • Time to reconstruct event • 25 kSI2K s/ev • On demand reconstruction at 150 Hz • 3.75 MSI2K ≈ 1000 CPU’s • Time to simulate and reconstruct event • 70 kSI2K s/ev • 1 CPU ≈ 1,800,000 ev/year • Simulated events for 2007: ≈ 140 CPU’s • Simulated events for 2008: ≈ 830 CPU’s 01/12/06 Oliver Gutsche - Computing Basics within CMS
Analysis requirements • Single Analysis has to access one primary dataset • For now, neglect MC (but keep in mind :-) ) • Assume: • Analysis needs access to AOD of dataset for selection • Every 3 days • Selection has to be finished at least after 3 days • Analysis needs access to RECO of dataset • Every 7 days • Analysis has to be finished at least after 7 days • Selection time: 0.25 kSI2K s/ev • Analysis time: 0.25 kSI2K s/ev 01/12/06 Oliver Gutsche - Computing Basics within CMS
year year data data Number of events per set Size per dataset CPU’s for Single user Time do deliver dataset CPU’s for 1000 user’s Mass storage output rate for single user total CPU’s Mass storage output rate for 1000 user’s 2007 AOD 0.45 TB 3 days 1.8 MB/s 1.8 GB/s 2007 AOD 5.5 Mio. 1.3 1330 1900 RECO 2.03 TB 7 days 3.5 MB/s 3.4 GB/s RECO 0.6 570 Total 5.3 MB/s 5.2 GB/s 2008 AOD 33 Mio. 8 8000 11400 2008 AOD 1.58 TB 3 days 6.4 MB/s 6.2 GB/s RECO 3.4 3400 RECO 7.88 TB 7 days 13.7 MB/s 13.3 GB/s Total 19.5 GB/s Analysis requirements 01/12/06 Oliver Gutsche - Computing Basics within CMS
Distributed Computing • Computing requirements are nearly impossible to fulfillat one place (CERN) • Distributed Computing • Distribute world-wide into a tier structure: • Data storage • Re-reconstruction • Bulk user analysis • MC production • 50 primary datasets can be distributed separately • User will have access to all tiers using GRID tools 01/12/06 Oliver Gutsche - Computing Basics within CMS
CMS Grid Tier Structure T2 T2 T2 T2 T2 T2 T1: USA T2 T2 T1: Italy T1: France T2 T2 T0 T2 T1: Germany T1: Spain T2 T2 T2 T1: UK T1: Taiwan T2 T2 T2 T2 T2 T2 T2 T2 01/12/06 Oliver Gutsche - Computing Basics within CMS
tier number functions data T0 1 (CERN) - accepts data from detector - archives data (full copy) - performs first pass reconstruction RAW (full copy) RECO (full copy) AOD (full copy) T1 7 (FNAL, ...) - archives data (subset of primary datasets) - re-reconstruction, calibration - skimming, analysis tasks RAW (subset) RECO (subset) AOD (full copy) MC samples T2 25 - analysis - calibration - MC simulation Skimmed Datasets AOD (subset) MC samples Tiers and Functionality • T0 distributes RAW and reconstructed data to T1’s • subset of the primary datasets, full AOD copy • T2’s are associated to specific T1 which provides support and distributes data • simulated MC is transferred back to associated T1 01/12/06 Oliver Gutsche - Computing Basics within CMS
Data flow • substantial computing resources are provided by the T1’s and T2’s • CMS-CAF performs latency critical activities like detector problem diagnostic, trigger performance service, derivation of calibration and alignment data 7 T1 25 T2 01/12/06 Oliver Gutsche - Computing Basics within CMS
US contribution to CMS Tier structure • U.S. contribution to CMS tier structure • T1 at FNAL • 7 attached T2 sites Wisconsin MIT T2 FNAL Nebraska T2 T1 T2 T2 Purdue CALTECH T2 San Diego T2 Florida T2 01/12/06 Oliver Gutsche - Computing Basics within CMS
Back to the user ... • After the discussion of the complex computing structure: • How does the user in the end do analysis? • Basic requirements: • Local account on workgroup cluster or private machine • CERN account (including CERN CMS registration) • GRID certificate • Access to workgroup cluster or private machine with: • Installed CMS software environment for user code development • Access to local datasets (on harddisk or on local mass storage system) • Installed GRID tools for submission of user code to T1 and T2 01/12/06 Oliver Gutsche - Computing Basics within CMS
Some remarks: User code • Importance of the local working environment: • “Prevent errors at the source before finding out the hard way” • Develop user code locally and compile it • Test user code locally on small test samples • Bug discovery and fix • Run a short test job on the sample (data or MC) you plan to use in your analysis (locally or via the GRID) • Test compatibility between your user code and the sample • Batch systems are prioritized to provide fair access for all users, test job avoids to unnecessarily worsen your priority • In the end, you want to produce “plots” to get your physics results. Your local environment will be the place where everything comes together. 01/12/06 Oliver Gutsche - Computing Basics within CMS
Remarks • “Due to the current lack of data, the following is described exemplary for a MC analysis” • “In the next talks, you will hear “everything” about the new framework CMSSW. As the framework is new and not complete yet and MC samples do only exist in tests, also the computing tools have not been fully adapted. In the following, the exemplary MC analysis is described for the old framework and will show the basic steps which will not change dramatically for the new framework.” 01/12/06 Oliver Gutsche - Computing Basics within CMS
CRAB • Access to dataset for distributed analysis • CRAB - CMS Remote Analysis Builder • Provides CMS users with • framework to run their analysis on datasets hosted by T1 and T2 centers • No detailed knowledge about GRID infrastructures necessary • Uses GRID infrastructure • Authentication by GRID certificates and virtual organizations (VO’s) • Job interaction (submission, status request, output retrieval) using GRID tools (middleware): • LHC Computing Grid (LCG) / European Data Grid (EDG): european GRID infrastructure • OpenScience Grid (OSG): US GRID infrastructure • common basic framework: Virtual Data Toolkit (VTD) including GLOBUS, CONDOR 01/12/06 Oliver Gutsche - Computing Basics within CMS
Authentication by GRID certificate • GRID certificate is an “electronic fingerprint” based on public key or asymmetric cryptography • Each resource of the GRID (also you as a user) has a key pair, a public and a private key (private key is password protected). • encryption and authorization uses the public key • decryption and digital signature uses the private key • The generation of a key pair is not sufficient. A Certificate Authority (CA) has to confirm your identity and sign your generated key pair. • The certificate and the GLOBUS toolkit: • Certificate required to be available in $HOME/.globus • userkey.pem = private key, protected by a passphrase (password) • usercert.pem = public key 01/12/06 Oliver Gutsche - Computing Basics within CMS
Virtual Organizations • Access to the GRID is steered by assigning resources or fraction of resources to Virtual Organizations (VO’s) • your VO: CMS • additional flag to tell that you are from USCMS: the USCMS role • When you want to run a job on a GRID resource, you sign your job with your private key using your passphrase • The resource decrypts your job using your public key, checks which VO you are associated to and runs your job according to the resource allocation to your VO • Application for a GRID certificate is described at: • http://www.uscms.org/SoftwareComputing/Grid/GettingStarted/index.html 01/12/06 Oliver Gutsche - Computing Basics within CMS
CRAB - a short introduction • CRAB splits user interaction into steps: • Creation of jobs • Submission of jobs • Status check of jobs • Retrieval of job output • CRAB takes care of user code and user output: • Packing of user executable and libraries • Shipping of user code to GRID resource for execution • Retrieval of the user code outputback to the submitter 01/12/06 Oliver Gutsche - Computing Basics within CMS
Creation: data discovery resolve requested dataset into identifier (RefDB) User: request to analyze dataset with user code 1. inquire which centers publish requested dataset 2. 3. contact centers and inquire about dataset locally (PubDB) local catalog local catalog local catalog Jobs are created locally - on the User’s submission computer - each job is able to run on all centers from the request list T1 T2 T2 All the user has to know: which datasamples can I use: http://cmsdoc.cern.ch/cms/production/www/PubDB/GetPublishedCollectionInfoFromRefDB.mod.php 01/12/06 Oliver Gutsche - Computing Basics within CMS
Submission, Status inquiry and Output retrieval Resource Broker (RB) - brokers job between requested centers - provides input and output file handling User’ssubmitter - providing created jobs to RB (ship user code to GRID) - checking status of jobs - retrieving output (retrieve user code output) T1 T1 T2 01/12/06 Oliver Gutsche - Computing Basics within CMS
CRAB usage All CRAB jobs last year 01/12/06 Oliver Gutsche - Computing Basics within CMS
Summary & Outlook • Computing for CMS in the LHC era is a challenge on its own • The tier structure will provide CMS users with acces to data and MC samples independent of their location • The user will have to know very little about the GRID and its tools • Most important: Apply for your GRID certificate! 01/12/06 Oliver Gutsche - Computing Basics within CMS
One more thing ... • Mailing lists for support • helpdesk@fnal.gov • LPC-HOWTO@listserv.fnal.gov • cms-wm-crab-feedback@cern.ch • Webpages: • User Computing at FNAL: http://www.uscms.org/SoftwareComputing/UserComputing/UserComputing.html • CRAB tutorial: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/Crab.html 01/12/06 Oliver Gutsche - Computing Basics within CMS
The end 01/12/06 Oliver Gutsche - Computing Basics within CMS