250 likes | 372 Views
1999 Summer Student Lectures Computing at CERN Lecture 2 — Looking at Data Tony Cass — Tony.Cass@cern.ch. Data and Computation for Physics Analysis. event filter (selection & reconstruction). detector. processed data. event summary data. raw data. batch physics
E N D
1999 Summer Student LecturesComputing at CERNLecture 2 — Looking at DataTony Cass — Tony.Cass@cern.ch
Data and Computation for Physics Analysis event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reconstruction analysis objects (extracted by physics topic) event simulation interactive physics analysis Tony Cass
event filter (selection & reconstruction) detector raw data Central Data Recording • CDR marks the boundary between the experiment and the central computing facilities. • It is a loose boundary which depends on an experiment’s approach to data collection and analysis. • CDR developments are also affected by • network developments, and • event complexity. Tony Cass
event simulation Monte Carlo Simulation • From a physics standpoint, simulation is needed to study • detector response • signal vs. background • sensitivity to physics parameter variations. • From a computing standpoint, simulation • is CPU intensive, but • has low I/O requirements. Simulation farms are therefore good testbedsfor new technology: • CSF for Unix and now PCSF for PCs and Windows/NT. Tony Cass
event summary data raw data event reconstruction Data Reconstruction • The event reconstruction stage turns detector information into physics information about events. This involves • complex processing • i.e. lots of CPU capacity • reading all raw data • i.e lots of input, possibly readfrom tape • writing processed events • i.e. lots of output whichmust be written topermanent storage. Tony Cass
event summary data batch physics analysis analysis objects (extracted by physics topic) Batch Physics Analysis • Physics analysis teams scan over all events to find those that are interesting to them. • Potentially enormous input • at least data from current year. • CPU requirements are high. • Output is “small” • O(102)MB • but there are many different teams andthe output must be stored for future studies • large disk pools needed. Tony Cass
Symmetric MultiProcessor Model Experiment Tape Storage TeraBytes of disks Tony Cass
Scalable model—SP2/CS2 Experiment Tape Storage TeraBytes of disks Tony Cass
Experiment Tape Storage Distributed Computing Model CPU Server Disk Server Switch Tony Cass
1998! SHIFTData intensive services Simulation Facility CORE Physics Services CERN CSF - RISC servers PCSF - PCs & NT Central Data Services Shared Tape Servers 70 computers, 250 processors (DEC, H-P, IBM, SGI, SUN) 8 TeraBytes embedded disk 46 H-P PA-RISC 3 tape robots 100 tape drives Redwood, DLT, Sony D1 IBM 3590, 3490, 3480 EXABYTE, DAT 20 PentiumPro 50 Pentium II DECPLUS, HPPLUS,RSPLUS, WGSInteractiveServices 32 IBM, DEC, SUN servers RSBATCH + PaRCPublic BatchService Shared Disk Servers 66 systems (HP, SUN, IBM, DEC) 2 TeraByte disk 10 SGI, DEC, IBM servers NAP - accelerator simulation service 15-node IBM SP2 36 PowerPC 604 10-CPU DEC 8400 10 DEC workstations Home directories& registry CS-2 Service - Data Recording & Event Filter Farm mei SUN & DECServers ko CERN Network consoles &monitors QSW CS-264-nodes (128-processors)2 TeraBytes disk Today’s CORE Computing Systems Tony Cass
RSBATCH Public BatchService NAP - accelerator simulation service 32 PowerPC 604 10-CPU DEC 8400 10 DEC workstations consoles &monitors Today’s CORE Computing Systems SHIFTData intensive services Simulation Facility CORE Physics Services CERN CSF - RISC servers PCSF - PCs & NT Central Data Services Shared Tape Servers 200 computers, 550 processors (DEC, H-P, IBM, SGI, SUN, PC) 25 TeraBytes embedded disk 25 H-P PA-RISC 10 PentiumPro 25 Pentium II 4 tape robots 90 tape drives Redwood, 9840 DLT, IBM 3590, 3490, 3480 EXABYTE, DAT, Sony D1 32 IBM, DEC, SUN servers PC Farms Data Recording, Event Filter and CPU Farms for NA45, NA48, COMPASS Shared Disk Servers 2 TeraByte disk 10 SGI, DEC, IBM servers 60 dual processor PCs DXPLUS, HPPLUS,RSPLUS,LXPLUS, WGSInteractiveServices Home directories& registry PaRC EngineeringCluster 70 systems (HP, SUN, IBM, DEC, Linux) 13 DEC workstations 3 IBM workstations CERN Network Tony Cass
analysis objects (extracted by physics topic) Interactive Physics Analysis • Interactive systems are needed to enable physicists to develop and test programs before running lengthy batch jobs. • Physicists also • visualise event data and histograms • prepare papers, and • send Email • Most physicists use workstations—either private systems or central systems accessed via an Xterminal or PC. • We need an environment that provides access to specialist physics facilities as well as to general interactive services. Tony Cass
Backup& Archive COREServices ReferenceEnvironments ASIS :Replicated AFS Binary Servers AFS Home Directory Services Central Services (mail, news, ccdb, etc.) Optimized Access GeneralStaged DataPool CERN InternalNetwork X-terminal Support PLUS CLUSTERS WorkGroupServer Clusters X Terminals Private Workstations. PCs Unix based Interactive Architecture Tony Cass
PC based Interactive Architecture Tony Cass
Event Displays Standard X-Y view Clever processing of events can also highlight certain features—such as in the V-plot views of ALEPH TPC data. V-plot view Event displays, such as this ALEPH display help physicists to understand what is happening in a detector. A Web based event display, WIRED, was developed for DELPHI and is now used elsewhere. Tony Cass
Data Analysis Work Most of the time, though, physicists will study event distributions rather than individual events. By selecting a dE/dx vs. p region on this scatter plot, a physicist can choose tracks created by a particular type of particle. RICH detectors provide better particle identification, however. This plot shows that the LHCb RICH detectors can distinguish pions from kaons efficiently over a wide momentum range. Using RICH information greatly improves the signal/noise ratio in invariant mass plots. Tony Cass
CERN’s Network Connections SWITCH National Research Networks RENATER 100 Mb/s Mission Oriented Link 2Mb/s IN2P3 TEN-155: Trans-European Network at 155Mb/s 6 Mb/s 39/155 Mb/s TEN-155 Public 12/20Mb/s CERN C&W (US) Test 155 Mb/s ATM Test Beds 100 Mb/s 2Mb/s Commercial WHO C-IXP Tony Cass
TEN-155 IN2P3 C&W (US) CERN 4.5Mb/s Out 3.7Mb/s In RENATER SWITCH CERN’s Network TrafficMay - June 1999 Incoming data rate 2.5Mb/s 1.7Mb/s 0.6Mb/s 40Mb/s 1Mb/s 6Mb/s 20Mb/s 1.9Mb/s 1.8Mb/s Outgoing data rate Link Bandwidth 0.1Mb/s 0.1Mb/s 100Mb/s 2Mb/s ~1 TB/month in each direction 1TB/month = 3.86Mb/s 1Mb/s = 10GB/day Tony Cass
Outgoing Traffic by ProtocolMay 31st-June 6th 1999 350 300 250 200 Elsewhere USA GigaBytes Transferred Europe 150 100 50 0 ftp www X afs int rfio mail news other Total Protocol Tony Cass
Incoming Traffic by Protocol May 31st-June 6th 1999 350 300 250 200 Elsewhere USA GigaBytes Transferred Europe 150 100 50 0 ftp www X afs int rfio mail news other Total Protocol Tony Cass
1998! European & US Traffic GrowthFeb ’97-Jun ’98 USA Start of TEN-34 connection EU Tony Cass
European & US Traffic GrowthFeb ’98-Jun ’99 USA EU Tony Cass
8 4 7.00 3.00 6.00 5.00 Total 4.00 2.00 Outgoing Incoming 3.00 2.00 1 1 0.00 0.00 ftp www X afs int rfio mail news other Total ftp www X afs int rfio mail news other Total 8 8 7.00 7.00 6.00 6.00 5.00 5.00 4.00 4.00 3.00 3.00 2.00 2.00 1 1 0.00 0.00 ftp www X afs int rfio mail news other Total ftp www X afs int rfio mail news other Total Traffic GrowthJun 98 - May/Jun 99 Total EU Other US Tony Cass
Round Trip times and Packet Loss rates Round trip times for packets to SLAC 1998 Figures. [This is measured with ping; A packet must arrive and be echoed back; if it is lost, it does not give a Round Trip Time value.] 5 seconds! Packet Loss rates to/from the US on the CERN link [But traffic to, e.g., SLAC passes over other links in the US and these may also lose packets.] Tony Cass
Looking at Data—Summary • Physics experiments generate data! • and physcists need to simulate real data to model physics processes and to understand their detectors. • Physics data must be processed, stored and manipulated. • [Central] computing facilities for physicists must be designed to take into account the needs of the data processing stages • from generation through reconstruction to analysis • Physicists also need to • communicate with outside laboratories and institutes, and to • have access to general interactive services. Tony Cass