1 / 24

ATLAS Trigger & Data Acquisition system: architecture & status

~25 min bias events ( >2k particles ) every 25 ns. Higgs → 2e+2 m O (1/hr). ATLAS Trigger & Data Acquisition system: architecture & status. Kostas KORDAS INFN – Frascati. HEP2006, Ioannina, Greece, 13-17 Apr. 2006. ATLAS Trigger & DAQ: architecture. p. p. Trigger. DAQ.

matteo
Download Presentation

ATLAS Trigger & Data Acquisition system: architecture & status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ~25 min bias events ( >2k particles ) every 25 ns Higgs → 2e+2m O(1/hr) ATLAS Trigger & Data Acquisition system: architecture & status Kostas KORDAS INFN – Frascati HEP2006, Ioannina, Greece, 13-17 Apr. 2006

  2. ATLAS Trigger & DAQ: architecture p p Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD 100 kHz 160 GB/s RoI Dataflow High Level Trigger ~10 ms L2 Level 2 ROS ROB ROB ROB RoI requests Read-Out Systems RoI data (~2%) L2N L2P EB ~3+6 GB/s ~ 3.5 kHz L2 accept (~3.5 kHz) EBN Event Builder SFI Event Filter EFN ~ sec EFP EF SFO EF accept (~0.2 kHz) Full info / event: ~ 1.6 MB/25ns 40 MHz ~ 300 MB/s ~ 200 Hz ATLAS TDAQ architecture & status - Kostas KORDAS

  3. Interactions every 25 ns: …in 25 ns particles travel 7.5 m Cable length ~100 meters: …in 25 ns signals travel 5 m From the detector into the Level-1 Trigger Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms FE Pipelines 22m 44 m Weight: 7000 t Total Level-1 latency = 2.5 msec (TOF + cables + processing + distribution) For 2.5 msec, all signals must be stored in electronic pipelines ATLAS TDAQ architecture & status - Kostas KORDAS

  4. Upon LVL1 accept: buffer data & get RoIs Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) 160 GB/s ROD ROD ROD 100 kHz Read-Out Drivers RoI Read-Out Links (S-LINK) Region of Interest Builder ROS Read-Out Buffers ROB ROB ROB ROIB Read-Out Systems • On average, LVL1 finds • ~2 Regions of Interest (in h-f)per event • Data in RoIs is a few % of the Level-1 throughput ATLAS TDAQ architecture & status - Kostas KORDAS

  5. LVL2: work with “interesting” ROSs/ROBs Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV ROD ROD ROD 100 kHz RoI 160 GB/s Level 2 ~10 ms L2 ROS RoI requests Read-Out Buffers ROB ROB ROB LVL2 Supervisor Read-Out Systems LVL2 Network LVL2 Processing Units ~3 GB/s RoI data (~2%) L2N L2P For each detector there is a simple correspondenceh-f Region Of InterestROB(s) • LVL2 Proccessing Units: for each RoI, the list of ROBs with the corresponding data from each detector is quickly identified RoI-based Level-2 trigger: A much smaller ReadOut network … at the cost of a higher control traffic ATLAS TDAQ architecture & status - Kostas KORDAS

  6. After LVL2: Build full events Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD 100 kHz RoI 160 GB/s Level 2 ~10 ms L2 ROS Read-Out Systems RoI requests ROB ROB ROB ~3+6 GB/s RoI data (~2%) L2N L2P Dataflow Manager Event Building Network EB L2 accept (~3.5 kHz) ~3.5 kHz EBN Sub-Farm Input SFI Event Builder ATLAS TDAQ architecture & status - Kostas KORDAS

  7. LVL3: Event Filter deals with Full Event info Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD 100 kHz RoI 160 GB/s Level 2 ~10 ms L2 ROS RoI requests ROB ROB ROB Read-Out Systems RoI data (~2%) ~3+6 GB/s L2N L2P EB Event Builder ~3.5 kHz L2 accept (~3.5 kHz) EBN Sub-Farm Input Full Event SFI Event Filter Event Filter Network EFN ~ sec EF EFP Farm of Event Filter Processors ~ 200 Hz ATLAS TDAQ architecture & status - Kostas KORDAS

  8. From Event Filter to Local (TDAQ) storage Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD 100 kHz RoI 160 GB/s ~10 ms L2 Level 2 ROS RoI requests ROB ROB ROB Read-Out Systems RoI data (~2%) ~3+6 GB/s L2N L2P EB ~3.5 kHz L2 accept (~3.5 kHz) EBN Event Builder SFI Event Filter Event Filter Network EFN ~ sec EF EFP SFO Event Filter Processors Sub-Farm Output EF accept (~0.2 kHz) ~ 200 Hz ~ 300 MB/s ATLAS TDAQ architecture & status - Kostas KORDAS

  9. TDAQ, High Level Trigger & DataFlow Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD 100 kHz RoI 160 GB/s Dataflow High Level Trigger Level 2 ~10 ms L2 ROS Read-Out Systems RoI requests ROB ROB ROB RoI data (~2%) ~3+6 GB/s L2N L2P EB ~3.5 kHz L2 accept (~3.5 kHz) EBN Event Builder SFI EFN ~ sec Event Filter EF EFP SFO EF accept (~0.2 kHz) ~ 200 Hz ~ 300 MB/s High Level Trigger (HLT) • Algorithms developed offline (with HLT in mind) • HLT Infrastructure (TDAQ job): • “steer” the order of algorithm execution • Alternate steps of “feature extraction” & “hypothesis testing”) fast rejection (min. CPU) • Reconstruction in Regions of Interest  min. processing time & network resources DataFlow • Buffer & serve data to HLT • Act according to HLT result, but otherwise HLT is a “black box” which gives answers • Software framework based on C++ code and the STL ATLAS TDAQ architecture & status - Kostas KORDAS

  10. High Level Trigger & DataFlow: PCs (Linux) Trigger DAQ Calo MuTrCh Other detectors Level 1 40 MHz 2.5 ms Det. R/O L1 accept (100 kHz) ROIB L2SV DFM ROD ROD ROD RoI Dataflow 150 nodes High Level Trigger ~10 ms L2 ROS 500 nodes RoI requests ROB ROB ROB RoI data (~2%) L2N L2P EB 100 nodes L2 accept (~3.5 kHz) EBN SFI 1600 nodes EFN ~ sec EF EFP SFO EF accept (~0.2 kHz) Infrastructure Control Communication Databases ATLAS TDAQ architecture & status - Kostas KORDAS

  11. Dealing with Large Scale DAQ/HLT systems HLT image LVL2 with algorithm EF with algorithm integrated: EB LVL2 EF Farm size(nodes): 100 220 220 230 300/800 700 Year: 2001 2002 2003 2004 4/2005 7/2005 LXBATCH test bed at CERN 5 weeks, June/July 2005 100 – 700 dual nodes farm size increasing in steps Verify functionality of integrated DAQ/HLT software system at large scale & find problems not seen at small scale Tested configurations up to ~30% of final ATLAS TDAQ LVL2 DC EF EF Infrastructure InfraStructure ATLAS TDAQ architecture & status - Kostas KORDAS

  12. TDAQ at the ATLAS site SDX1 dual-CPU nodes CERN computer centre ~30 ~1600 ~100 ~ 500 Local Storage SubFarm Outputs (SFOs) Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Event rate ~ 200 Hz Second- level trigger Data storage pROS DataFlow Manager Network switches stores LVL2 output Network switches LVL2 Super- visor Gigabit Ethernet Event data requests Delete commands Requested event data Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3 kHz Regions Of Interest USA15 Data of events accepted by first-level trigger 1600 Read- Out Links ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each “pre-series” system: ~10% of final TDAQ in place SDX1 USA15 UX15 ATLAS TDAQ architecture & status - Kostas KORDAS

  13. ReadOut Systems: 150 PCs w/ special cards “Hottest” ROS from paper model High Lumi. operating region Low Lumi. operating region 2. Measurements on real ROS H/W LVL1 accept rate (kHZ) LVL2 accept rate (% of input) 12 ROS in place, more arriving ROS units contain 12 R/O Buffers 150 units needed for ATLAS (~1600 ROBs) A ROS unit is implemented with a 3.4 GHz PC housing 4 custom PCI-x cards (ROBIN) Not all ROSs are equal in rate of data requests RODROS re-mapping can reduce requirements on busiest (hottest) ROS Performance of final ROS (PC+ROBIN) is above requirements Note: we have also ability to access individual ROBs if wanted/needed ATLAS TDAQ architecture & status - Kostas KORDAS

  14. Event Building needs • Throughput requirements: • 100 KHz LVL1 accept rate • 3.5% LVL2 accept rate  3.5 KHz EB • 1.6 MB event size •  3.5 x 1.6 = 5600 MB/s total input • Network limited (fast CPUs): • Event building using • 60-70% of Gbit network • ~70 MB/s into each • Event Building node (SFI) So, we need: • 5600 MB/s into EB system / (70MB/s in each EB node)  need ~80 SFIs for full ATLAS • When SFI serves EF, throughput decreases by ~20%  actually need 80/0.80 = 100 SFIs 6 prototypes in place, evaluation of PCs now,  expect big Event Building needs from day 1: >50 PCs till end of year ATLAS TDAQ architecture & status - Kostas KORDAS

  15. Tests of LVL2 algorithms & RoI collection 8 1 pROS L2SV Emulated ROS 1 1 L2PU pROS 1 DFM • Plus: • 1 Online Server • 1 MySQL data base server 1) Majority of events rejected fast Electron sample is pre-selected Di-jet, m & e simulated events preloaded on ROSs; RoI info on L2SV 2) Processing takes ~all latency: small RoI data collection time 3) Small RoI data request per event Note: Neither Trigger menu, nor data files representative mix of ATLAS (this is the aim for a late 2006 milestone) ATLAS TDAQ architecture & status - Kostas KORDAS

  16. Scalability of LVL2 system • L2SV gets RoI info from RoIB • Assigns a L2PU to work on event • Load-balances its’ L2PU sub-farm • Can scheme cope with LVL1 rate? • Test with preloaded RoI info into RoIB, which triggers TDAQ chain, emulating LVL1 • LVL2 system is able to sustain the LVL1 input rate: • 1 L2SV system for LVL1 rate ~ 35 kHz • 2 L2SV system for LVL1 rate ~ 70 kHz (50%-50% sharing) Rate per L2SV stable within 1.5% ATLAS will have a handful of L2SVs  can easily manage 100 kHz LVL1 rate ATLAS TDAQ architecture & status - Kostas KORDAS

  17. CPU power for HLT: L2PU performance Test with AMD dual-core, dual CPU @ 1.8 GHz, 4 GB total • At TDR we assumed: • 100 kHz LVL1 accept rate • 500 dual-CPU PCs for LVL2 • 8 GHz per CPU at LVL2 • So: • each L2PU does 100Hz • 10ms average latency per event in each L2PU • 8 GHz per CPU will not come • But, dual-core dual-CPU PCs show scaling! Preloaded ROS w/ muon events, run muFast @ LVL2 We should reach necessary performance per PC at cost of higher memory needs & latency (shared memory model would be better here) ATLAS TDAQ architecture & status - Kostas KORDAS

  18. Cosmics in ATLAS in the pit Last Sept: cosmics in the Tile hadronic calorimeter, brought via the pre-series(monitoring algorithms) This July: Cosmic run with LAr EM + Tile Had Cal (+Muon detectors?) 18 ATLAS TDAQ architecture & status - Kostas KORDAS

  19. Summary • ATLAS TDAQ: • 3-level trigger hierarchy • Use Regions of Interest from previous level: small data movement • Feature extraction + hypothesis testing: fast rejection  min. CPU power • Architecture has been validated via deployment of testbeds • Large Scale Tests to test/improve infrastructure performance • Pre-series (~10% of final TDAQ) system in use • Dataflow performs according to specs: modeling matches measurements •  confident on extrapolations to full ATLAS TDAQ • High Level Trigger: • - Region of Interest collection at LVL2: low cost in time & data transfer • - LVL2 system proven to scale: will meet LVL1 rate • - Event Filter: Data throughput scales with farm size • - Multi-core multi-CPU PCs: should be able to provide our original • landmark of “dual CPU, 8 GHz each” PC requirements TDAQ will be ready in time for LHC data taking • We are in the installation phase of system • Cosmic run with Central Calorimeters (+muon system?) this summer ATLAS TDAQ architecture & status - Kostas KORDAS

  20. Thank you ATLAS TDAQ architecture & status - Kostas KORDAS

  21. EF performance scales farm size Dummy algorithm: always accept, but with fixed delay • Test e/g & m selection algorithms • HLT algorithms seeded by L2Result • pre-loaded (e & m) simulated events on 1 SFI Emulator serving EF farm • Results here are for muons: Initially CPU limited, but eventually bandwidth limited Event size 1 MB Running muon algorithms: scaling with EF farm size(still CPU limited with 9 nodes) • Previous Event Filter I/O protocol limited rate for small event sizes (e.g., partially built)  changed in current TDAQ software release ATLAS TDAQ architecture & status - Kostas KORDAS

  22. ATLAS Trigger & DAQ: philosophy Latency Rates Muon Calo Inner 40 MHz Pipeline Memories LVL1 2.5 ms ~100 kHz Read-Out Drivers ROD ROD ROD ROD ROD ROD ROD ROD ROD Read-Out Subsystems hosting Read-Out Buffers ~10 ms ROB ROB ROB ROB ROB ROB ROB ROB ROB ROB ROB ROB RoI ~3 kHz LVL2 Event builder cluster ~1 s Event Filter farm EF ~200 Hz Local Storage: ~ 300 MB/s Hardware based (FPGA, ASIC) Calo/Muon (coarse granularity) Software (specialised algs) Uses LVL1 Regions of Interest All sub-dets, full granularity Emphasis on early rejection High Level Trigger Offline algorithms Seeded by LVL2 result Work with full event Full calibration/alignment info ATLAS TDAQ architecture & status - Kostas KORDAS

  23. Data Flow and Message Passing ATLAS TDAQ architecture & status - Kostas KORDAS

  24. Data Collection application example: Event Builder ROS & pROS Trigger (Event Filter) Event Fragments Data Requests Event Input Activity Request Activity Assignment Event Handler Activity SFI: Event Builder *Event Fragments *Event Event Assembly Activity Event Sampler Activity Assignment Event Event Monitoring Data Flow Manager ATLAS TDAQ architecture & status - Kostas KORDAS

More Related