590 likes | 747 Views
ACADEMIC TRAINING. B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic Training CERN. Outline. Day 1 (Pierre VANDE VYVRE) Outline, main concepts Requirements of LHC experiments Data Challenges Day 2 (Bernd PANZER) Computing infrastructure Technology trends
E N D
ACADEMIC TRAINING B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic Training CERN
Outline • Day 1 (Pierre VANDE VYVRE) • Outline, main concepts • Requirements of LHC experiments • Data Challenges • Day 2 (Bernd PANZER) • Computing infrastructure • Technology trends • Day 3 (Pierre VANDE VYVRE) • Trigger and Data Acquisition • Day 4 (Fons RADEMAKERS) • Simulation, Reconstruction and analysis • Day 5 (Bernd PANZER) • Computing Data challenges • Physics Data Challenges • Evolution CERN Academic Training 12-16 May 2003
Trigger and Data Acquisition • Dataflow, Trigger and DAQ architectures • Trigger • Data transfer • Event building • Storage • Software framework • Simulation • Conclusion CERN Academic Training 12-16 May 2003
Online dataflow Detector Digitizers Trigger Level 0,1 Front-end Pipeline/Buffer Decision Trigger Level 2 Readout Buffer Decision Subevent Buffer Event-Build. Netw. High-Level Trigger Event Buffer Decision Storage network Transient storage Permanent storage CERN Academic Training 12-16 May 2003
TRG/DAQ/HLT FE FE FE LDC FEP LDC FEP LDC FEP LDC FEP GDC GDC GDC GDC PDS TDS Rare/All CTP L0, L1a, L2 BSY BSY LTU LTU L0, L1a, L2 TTC TTC FERO FERO FERO FERO FERO FERO FERO FERO Event Fragment Sub-event Event File DDL HLT Farm RORC RORC RORC RORC RORC RORC RORC RORC LDC FEP LDC FEP Load Balancing Event Building Network EDM Storage Network CERN Academic Training 12-16 May 2003
ALICE Levels 4 LV-1 rate 500 Hz Readout 25 GB/s Storage 1250 MB/s CERN Academic Training 12-16 May 2003
ATLAS 40 MHz ROD RoI 75 kHz 120 GB/s RRC ROB ROIB L2SV RRM L2N L2P IOM ~2 kHz ~3+3 GB/s EBN SFI DFM EFN EFP EFP EFP EFP ~ 200 Hz SFO ~ 300 MB/s Calo MuTrCh Other detectors Trigger DAQ 40 MHz D E T R/O LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data = 2% 120 GB/s D A T A F L O W H L T ~ 10 ms ROD-ROB Connection R/O S Y S T E M LVL2 RoI Builder RoI requests Read-Out Buffers L2 Supervisor L2 N/work L2 Proc Unit ROD-ROS Merger I/O Manager Lvl2 acc = ~2 kHz Dataflow Manager E V B Event Building N/work ~ sec Event Filter ~3 GB/s Sub-Farm Input Event Filter Processors Event Filter N/work EFacc = ~0.2 kHz Sub-Farm Output Levels 3 LV-1 rate 100 kHz Readout 100 GB/s Storage 100 MB/s CERN Academic Training 12-16 May 2003
CMS Levels 2 LV-1 rate 100 kHz Readout 100 GB/s Storage 100 MB/s CERN Academic Training 12-16 May 2003
LHCb Levels 3 LV-1 rate 1 MHz Readout 4 GB/s Storage 40 MB/s CERN Academic Training 12-16 May 2003
Trigger Multi-level trigger system Reject background Select most interesting collisions Reduce total data volume CERN Academic Training 12-16 May 2003
Multi-level trigger • Multi-level trigger system to optimize • Rate and granularity • System speed and size • Technology required CERN Academic Training 12-16 May 2003
Trigger • Trigger Level 0 • Custom logic • Trigger Level 1 • Custom logic • Special architectures • Computing farm • Trigger Level 2 • Special architectures • Computing farm • High Level Trigger (HLT) • Computing farm • HEP specific • Home-made development • Custom building blocks • Fast but rigid • Programmable by “a few experts” • General-purpose • Home-made software • Commodity building blocks • Slow but flexible • Programmable by “all” CERN Academic Training 12-16 May 2003
Trigger & Timing distribution • Transfer from TRG to electronics • One to many • Massive broadcast (100’s to 1000’s) • Optical, Digital • HEP-specific components • HEP developments CERN Academic Training 12-16 May 2003
Trigger & Timing distribution CERN Academic Training 12-16 May 2003
Detector & Readout Data Link (1) • Transfer from detector to DAQ • Point-to-point • Massive parallelism (100’s to 1000’s) • Interface detector/readout • Analog • HEP-specific components • Digital • HEP developments based on commodity components • Fiber Channel or Gigabit Ethernet • 2.1 or 2.5 Gb/s CERN Academic Training 12-16 May 2003
Detector & Readout Data Link (2) (1-LP)250 MB/s SPI 250 MB/s 850nm 50/125um MMF OSC PM TX_CLK 2.5 Gb/s PLL TX PROTOCOL device IF_CLK OT SERDES DDL Interface RX TLK2501 PLL OT – optical transceiver PM – power monitor circuit OSC – crystal oscillator SERDES – serializer/de-serializer ID – identification memory RX_CLK APEX20KE I2C ID CERN Academic Training 12-16 May 2003
Optical link source CERN Academic Training 12-16 May 2003
Links Adapters • Transfer from 1 or several links to I/O bus of the memory or the computer • Many-to-one • Massive parallelism (100’s to 1000’s) • Interface detector/readout • Physical interface realized by • Custom chip • IP core (VHDL code synthesized in FPGA) CERN Academic Training 12-16 May 2003
PCI evolution • Initiative of Intel • Public from the start, “imposed” to industry • Industry de-facto standard for local I/O: PCI (PCI SIG) • 1992: origin 32 bits 33 MHz 133 MBytes/s • 1993: V2.0 32 bits • 1994: V2.1 • 1996: V2.2 64 bits 66 MHz 512 MBytes/s • 1999: PCI-X 1.0 64 bits 133 MHz 1 GBytes/s • 2002: PCI-X 2.0 64 bits 512 MHz 4 Gbytes/s CERN Academic Training 12-16 May 2003
Optical link destination & PCI adapter CERN Academic Training 12-16 May 2003
Link and adapter performance (1) • Example of ALICE DDL and RORC • PCI 32 bits 33 MHz interface with custom chip • No local memory. Fast transfer to PC memory DDL saturated for block size above 5 kBytes: • 101 Mbytes/sec Event rate saturated for block size below 5 kBytes: • 35’000 events/sec • RORC handling overhead in LDC: 28 µsec CERN Academic Training 12-16 May 2003
Link and adapter performance (2) • PCI 32 bits 66 MHz with commercial IP core • No local memory. Fast transfer to PC memory Reach 200 MB/s for block size above 2 kBytes. Total PCI load: 92 % Data transfer PCI load: 83 % CERN Academic Training 12-16 May 2003
Subevent & event buffer • Baseline: • Adopt commodity component (PC) • Develop fast dual-port memories • Key parameters: • Cost/performance • Performance: memory bandwidth CERN Academic Training 12-16 May 2003
PC Memory Bandwidth www.cs.virginia.eduStream v4.0 with gcc 2.96-103 } Pentium II & III machines AMD modules Xeon machine CERN Academic Training 12-16 May 2003
Event Building Network (1) • Baseline: • Adopt broadly exploited standards Switched Ethernet (ALICE, ATALS, CMS) • Adopt a performant commercial product Myrinet (CMS) • Motivations for switched Ethernet: • Performance of Gigabit Ethernet switches currently available already adequate for most DAQ @ LHC • Use of commodity items: network switches and interfaces • Easy (re)configuration and reallocation of resources • Same technology also used for DAQ services CERN Academic Training 12-16 May 2003
Event Building Network (2) LDC LDC LDC LDC TPC LDC TDS GDC 200 MB/s 2500 MB/s C 0 2500 MB/s 600 MB/s TRD LDC 211 224 1 7 1 14 1 8 1 14 ... 1 9 TOF-HM-PHOS Switch 22 Sector 1-2 Switch 1 Sector 35-36 Switch 18 Pixel - Strips Switch 19 Muon-PMD-TRG Switch 21 Drift Switch 20 Data Link to computing center 1250 MB/s GDC 1 GDC 4 TDS 1 TDS 2 60 MB/s 60 MB/s 1 10 1 10 ... 21 20 31 40 CERN Academic Training 12-16 May 2003
Event Building Network (3) • Baseline: • Adopt broadly exploited standards Transport protocol: TCP/IP (ALICE event building) • Adopt efficient protocol Transport protocol: raw packets (LHCb TRG L1) • Motivations for TCP/IP: • Reliable and stable transport service: • Flow control handling • Lost packet handling • Congestion control • Can be verified during the ALICE Data Challenges • Industry mainstream: • Guaranteed support from present and future industrial providers: operating systems, switches, interfaces • Constant improvements CERN Academic Training 12-16 May 2003
Ethernet NIC’s Performance • Fast Ethernet copper • Intel 82557, 82550, 82559 with eepro100 driver, mostly on-board • around 11 MB/s, stable • 3Com 3C980*, 3C905 with 3c59x driver, mostly on-board • around 11 MB/s, stable • Gigabit Ethernet • NetGear GA620 with acenic driver • up to 78 MB/s • 3Com 3C996 with bcm5700 or tg3 driver • up to 88 MB/s (150% of one CPU) • Intel Pro/1000* (82545EM) with e1000 driver • up to 95 MB/s (56% -> 75% of one CPU) CERN Academic Training 12-16 May 2003
ADC IV: DATE Event Building (1) CERN Academic Training 12-16 May 2003
ADC IV: DATE Event Building (2) Event building No recording • 5 days non-stop • 1750 MBytes/s sustained (goal was 1000) CERN Academic Training 12-16 May 2003
Transient Data Storage • Transient Data Storage at point 2 before archiving (migration to tape), if any, in the computing center • Several options being tested by ALICE DAQ • Technologies • Disk attachment: • DAS: IDE (commodity), SCSI • NAS: disk server • SAN: Fiber Channel • RAID-level • Key selection criteria: cost/performance & bandwidth/box CERN Academic Training 12-16 May 2003
Storage: file & record size (file cache active) CERN Academic Training 12-16 May 2003
Storage: file & record size (file cache inactive) CERN Academic Training 12-16 May 2003
Storage: effect of connectivity CERN Academic Training 12-16 May 2003
Storage: effect of SCSI RAID CERN Academic Training 12-16 May 2003
Transient Data Storage • Disk storage highly non scalable • To achieve high bandwidth performance • 1 stream, 1 device, 1 controller, 1 bus • With these conditions: • 15-20 MB/s with 7.5 kRPM IDE disks • 18-20 MB/s with 10 kRPM SCSI disks • To obtain 1.25 GB/s with commodity solutions • Footprint too big • Infrastructure cost too high • Investigate ways to obtain more compact performance • RAID (Redundant Array of Inexpensive Disks) • RAID 5, large caches, intelligent controllers • HP 3 SCSI devices: 30 MB/s with 10 kRPM disks • HP 6 SCSI devices: 40 MB/s with 10 kRPM disks • EMC 7 FCS: 50 MB/s with 10 kRPM disks (4 U) • IBM 5 FCS: 70 MB/s with 15 kRPM disks • Dot Hill SANnet II: 90 MB/s with 15 kRPM disks (2 U) CERN Academic Training 12-16 May 2003
Permanent Data Storage (1) • Infinite storage • At very low cost • Must be hidden by a MSS • Critical area • Small market • Limited competition • Not (yet) commodity • Solution demonstrated since ‘02 CERN Academic Training 12-16 May 2003
Permanent Data Storage (2) Tape Drive STK 9940A 10 MB/s 60 GB/Volume SCSI STK 9940B 30 MB/s 200 GB/Volume Fibre Channel Tape Library Several tape drives of both generations CERN Academic Training 12-16 May 2003
Permanent Data Storage (3) CERN Academic Training 12-16 May 2003
DAQ Software Framework • DAQ Software Framework • Common interfaces for detector-dependant applications • Target the complete system from the start • ALICE DATE (Data Acquisition and Test Environment) • Complete ALICE DAQ software framework: • Data-flow: detector readout, event building • System configuration, control (100’s of programs to start, stop, synchronize) • Performance monitoring • Evolving with requirements and technology • Key issues • Scalability • Very small configurations (1 PC). Used in test beams • Verified for scale of the final system (100’s of PCs) during the DC • Support and documentation CERN Academic Training 12-16 May 2003
Run Control (1) CERN Academic Training 12-16 May 2003
Run Control (2) State of one node CERN Academic Training 12-16 May 2003
Performance monitoring - AFFAIR Fabric monitoring Round Robin DB ROOT Plots ROOT DB ROOT I/O performances LDC DATE Evt. Build. Switch DATE performances GDC ROOT I/O Files ROOT Plots for Web Disk Server CASTOR Tape Server CASTOR performances CERN Academic Training 12-16 May 2003
Control Hierarchy ECS Detector Control Sys. Trigger Control Sys. DAQ Run Control Pixel Muon TPC Pixel Muon TPC Pixel Muon TPC HV GAS TPC LTC LDC 1 LDC 2 LDC 216 CERN Academic Training 12-16 May 2003
Experiment Control System ECS functions • Configuration and booking • Synchronize subsytems • Operator console • Automated procedures • State Machines • Command/Status ECS config DCS TRG DAQ operators operators DCS TRG DAQ Pixel Muon TPC Pixel Muon TPC Pixel Muon TPC CERN Academic Training 12-16 May 2003
Partition: Physics Run Detector A Detector B TTC Rx Det. RO DDL SIU LDC/FEP DDL DIU RORC Physics Run Partition CTP LTC LTC TTC TTC TTC TTC DDL DDL LDC/FEP LDC/FEP LDC/FEP DDL SIU LDC/FEP RORC Event Building Network GDC GDC GDC GDC GDC Storage CERN Academic Training 12-16 May 2003
2 Partitions: Physics Run & Standalone TTC Rx Det. RO DDL SIU LDC/FEP DDL DIU RORC Physics Run Partition Standalone Partition CTP LTC LTC TTC TTC TTC TTC DDL DDL LDC/FEP LDC/FEP LDC/FEP DDL SIU LDC/FEP RORC Event Building Network GDC GDC GDC GDC GDC Storage CERN Academic Training 12-16 May 2003
ADC IV: DATE Scalability test CERN Academic Training 12-16 May 2003
Simulation conditions/results Conditions: • 8000 Hz tot: 1600 Hz CE,MB,EL,MU • EL, MU considered rare • 50 % rejection at LDC • HLT rejects 80 % of EL • Realistic event sizes, distributions, buffer numbers, transfer rates Original count After P/F considerations Final count Huge and unacceptable decrease of EL and MU triggers due to detector busy CERN Academic Training 12-16 May 2003
The problem Would like to accept all rare decays CE,MB ~50 GB/s (after P/F and time to read events into detector buffers) EL MU Detectors ~25 GB/s (limited by DDL rates) DAQ ~1.25 GB/s (after compression by 0.5) Huge reduction of the original rare decays (Electron-EL and Muon-MU) due to various backpressures PDS CERN Academic Training 12-16 May 2003