450 likes | 590 Views
The ATLAS Event Builder : Design and Performance during the integration and commissioning of the final 1/3 system. Kostas KORDAS LHEP, University of Bern. XXV Workshop on Recent Developments in High Energy Physics & Cosmology EESFUE , NTU Athens, 30 Mar. 2007.
E N D
The ATLAS Event Builder :Design and Performance during the integration and commissioning of the final 1/3 system Kostas KORDAS LHEP, University of Bern XXV Workshop on Recent Developments in High Energy Physics & Cosmology EESFUE, NTU Athens, 30 Mar. 2007
ATLAS Trigger & DAQ: the need LHC TeVatron Total p-p cross section at 14 TeV ~100 mb The “interesting” physics is at ~1 nb to ~1 pb a ratio of 1:108 to 1:1011 ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : the job p p Data Storage ~ 200 Hz (300 MB/s) Surface SDX1 100 m Trigger and Data AcQuisition selects and stores the “most interesting” events for offline analysis 3 trigger levels USA15 UX15 40 MHz p-p bunch crossing (60 PB/s) 1 GHz p-p interactions crossing rate: 40 MHz Full event info: ~ 1.6 MB ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : LVL1 p p Data Storage SDX1 USA15 Data of events accepted by first-level trigger UX15 ~150 PCs Dedicated links VME Read-Out Subsystems (ROSs) Read- Out Drivers (RODs) ATLAS detector First- level Trigger (muon, calo info) First- level trigger UX15 Timing Trigger Control (TTC) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : ReadOut Systems p p Data Storage SDX1 ROSs get ~ 160 GB/s 1600 point-to-point Read-Out Links (<160 MB/s each) USA15 Data of events accepted by first-level trigger UX15 ~150 PCs VME Read-Out Subsystems (ROSs) Dedicated links Read- Out Drivers (RODs) ATLAS detector First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : ReadOut Buffers p p Input from detector Read Out Drivers 3 Read Out Buffers per ROBIN card SDX1 USA15 4 such cards per ROS PC Data of events accepted by first-level trigger 1600 Read- Out Links UX15 VME Read-Out Subsystems (ROSs) Dedicated links ~150 PCs Read- Out Drivers (RODs) ATLAS detector First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : RoIs p p Data Storage SDX1 USA15 On average, LVL1 finds ~2 Regions of Interest (in h-f)per event Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : LVL2 p p Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each 2nd level trigger LVL2 Farm Data Storage SDX1 pROS stores LVL2 result LVL2 Supervisor Network switches LVL2 pulls ~ 3 GB/s from Read Out Buffers ~ 2% of Event data (RoIs) pulled@ ≤ 100 kHz USA15 Regions of Interest RoI Builder ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : Event Builder p p LVL2 Super- visor Regions Of Interest Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Event Builder Farm (SFIs) LVL2 farm Second- level trigger Data Storage SDX1 DataFlow Manager pROS stores LVL2 result Network switches EB pulls ~ 6 GB/s from Read Out Buffers Full Event data pulled@ ~3.5 kHz USA15 ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : Event Filter (“LVL3”) p p Event Builder SubFarm Inputs (SFIs) DataFlow Manager LVL2 Super- visor Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3.5 kHz Event data requests Delete commands Regions Of Interest Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Event Filter (EF) Farm 3rd level trigger LVL2 farm Second- level trigger Data Storage SDX1 Network switches pROS stores LVL2 result Network switches Requested event data USA15 Gigabit Ethernet ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : Data Storage p p Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Second- level trigger pROS DataFlow Manager Network switches stores LVL2 result Network switches LVL2 Super- visor Gigabit Ethernet Requested event data Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3.5 kHz Event data requests Delete commands Regions Of Interest Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Event rate ~ 200 Hz Local Storage (SFOs) Data Storage SDX1 USA15 ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ : need ~2500 PCs p p multi-CPU nodes 5 ~1600 ~100 ~ 500 Local Storage SubFarm Outputs (SFOs) Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Event rate ~ 200 Hz Second- level trigger pROS DataFlow Manager Network switches stores LVL2 result Network switches LVL2 Super- visor Gigabit Ethernet Requested event data Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3.5 kHz Event data requests Delete commands Regions Of Interest Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) ATLAS detector Read-Out Subsystems (ROSs) RoI Builder First- level trigger UX15 Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each CERN computer centre ~100 racks in 2 floors Event Building, LVL2, Event Filter, online and monitoring infrastructure Data Storage SDX1 USA15 ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Focus: Event Building (EB) in ATLAS site SFI SFI SFI Gigabit Ethernet Requested event data Full Event data pulled@ ~3.5 kHz Event data requests Delete commands Event Builder nodes … Network switches Data Flow Manager Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Event Building functionality (1) SFI SFI SFI Gigabit Ethernet Requested event data Event data pulled: full events @ ~ 3.5 kHz Event data requests Delete commands … Network switches Data Flow Manager • DFM: • Triggers Event Building using either • Internal trigger, or • L2Decision messages via Gbit LAN • Load-balances Event Building farm • Sends “Clear msg” to ROS for events rejected by LVL2 or done by Event Builder Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Event Building functionality (2) SFI SFI SFI Gigabit Ethernet Requested event data Event data pulled: full events @ ~ 3.5 kHz Event data requests Delete commands Multi-Threaded C++ application Running on Linux Output: Event Filter or Local Disk • SFI: the Event Builder node • Assembles full events and serves: • Event Filter, or Local Disk • In parallel, serves Events to Monitoring Tasks in spying mode • (e.g., to be used to Event Display) To Event monitoring tasks … Network switches Data Flow Manager • DFM: the Data Flow Manager • Triggers Event Building using either • Internal trigger, or • L2Decision messages via Gbit LAN • Load-balances Event Building farm • Sends “Clear msg” to ROS for events rejected by LVL2 or done by Event Builder Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Event Building needs: bandwidth decides Event Builder (SFIs) DFM Gbit links Network switches • Throughput requirements: • LVL2 accept rate: ~3.5 kHz EB; Event size 1.6 MB • 5.6 GB/s total input Gbit links Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Event Building needs: bandwidth decides Event Builder (SFIs) • Network limited (fast CPUs): • Want event building using 60-70% of Gbit network • ~70 MB/s into each Event Building node (SFI) DFM Gbit links Network switches • Throughput requirements: • LVL2 accept rate: ~3.5 kHz EB; Event size 1.6 MB • 5600 MB/s total input Gbit links We need ~100 SFIs for full ATLAS Read-Out Subsystems (ROSs) 32 Event Building nodes (SFIs) installed 1/3 of final EB in place (we’ll get another ~20 SFIs by end of year 50% of final EB needs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
First 1/3 of the EB system installed 32 SFIs; extra NIC installed in each SFI 12 DFMs for 12 sub-detector zones to be able to run in parallel ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB Commissioning: 14-16 Feb.’07 Tech. Run Final network Local Disks CASTOR • 32 SFIs, writing to local disk; • ~10 had their data files pulled to • CERN storage system (CASTOR) • 5 DFMs; • Exercised triggering without a LVL2 system • and running parallel partitions of ATLAS Event Builder (SFIs) Switches DFM Final Components • ~60 ROSs, producing fake events; • either at the ROS PC level, or the ROBIN card level Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB Commissioning:Output functionality Local Disk CASTOR • SFI writing to disk • Limited speed to write to single disk: ~30 MB/s per SFI • When disk level arrives at 90% and SFI goes to busy; • restarts building events when disk <90% full • Files from SFIs pulled to CASTOR • Currently, line from SFIs to CASTOR is 100 Mbits/sec Event Builder (SFIs) Switches DFM • Events to Monitoring Tasks: • Event Dump to check event content • Event Display etc • typically the SFI limits rate to 1-out-of-100 events, and not more than 20 Hz anyway. Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB Commissioning:Building functionality • Network protocols for application communication • Default scenario: mixed protocols (UDP & TCP) • SFI-ROS: UDP (SFI re-asks missing fragments as needed) • DFM-ROS: UDP (garbage collection when many lost clears) • DFM-SFI: TCP (Assignment and End Of Event messages) • Used assorted ROS fragment sizes (0.6 -120 kB) to emulate data from detectors • Event size 1.5 MB • Had to use TCP and it works fine • (UDP does not work with fragments > 64 kB) SFI-N SFI-1 … Switches DFM Read-Out Subsystems (ROSs) • SFIs perfectly balanced in Event Building rate ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Controlling the EB rate SFI • 42 ROSs (453 kB event Size) • 29 SFIs • mixed network protocols SFI.Traffic Shaping controls EB rate: parameter to tune • Number of outstanding requests for ROS fragments Network switches Low # of outstanding requests: under-use the network … ROS ROS ROS ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Scaling w/ EB farm SFI SFI SFI • 42 ROSs (453 kB event Size) • SFI.TrafficShaping = 10 • mixed network protocols Scaling of EB rate with # of SFIs ~114 MB/s per extra SFI … Network switches … ROS ROS ROS ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Scaling w/ Event Size SFI SFI SFI • 42 ROSs ( Event Size of 0.5, 1.0 or 2.0 x 453 kB) • 29 SFIs • SFI.TrafficShaping = 10 • mixed network protocols EB rate scales w/ Event size • but for small event fragments needs re-tuning of Traffic Shaping) … Network switches … ROS ROS ROS ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Trigger EB w/out LVL2? Local Disks CASTOR • Pull-event model: • We’ve verified that SFIs are balanced in Event Building rate • But, we need LVL2 Decisions to trigger Event Building in data-driven mode • How to do this before LVL2 is available? (e.g., now, in commissioning setups) Data Event Builder (SFIs) Event Assignments DFM Data Requests L2 Decisions ? Data Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
LVL1 Trigger Distribution (1) • Local Trigger Processor (hosted in a TTC crate): • Accepts triggers from outside or generates own • Generates LVL1ids • Distributes Trigger to other LTPs and Trigger, Timing & Control modules for further distribution to detector Front End and ReadOut Drivers A sub-detector TTC crate Trigger Signals Trigger Source LTP(s), TTCvi(s) Front End, RODs LTP (e.g., Detector NIM signal, Central Trigger Processor, Random Internal to LTP, etc) Data Read-Out Subsystems (ROSs) (L1id) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
LVL1 Trigger Distribution (2) • Central Trigger Processor distributes Trigger to the various TTC crates • In there, a Local Ttrigger Processor controls Trigger distribution to “his” sub-detector zone (& data end up in the specified ROSs) Trigger Signals TTC crate 1 Central Trigger Processor LTP(s), TTCvi(s), FE, RODs LTP Data Trigger Signals TTC crate 2 Read-Out Subsystems (ROSs) LTP(s), TTCvi(s), FE, RODs LTP ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Trigger EB by faking L2 (1) • One Local Trigger Processor (LTP) has special role: • TTC2LAN: (Trigger, Timing & Control toLocal Area Network) • An application running on same crate’s single board computer • Polls LTP for Triggers (L1id) • For each trigger, distributes L2Decision (w/ L1id) to DFM via LAN • LVL2 Accepts: a (configurable) fraction of LVL1 Accepts Local Disks CASTOR Data Event Builder (SFIs) Event Assignments DFM Data Requests L2Decisions XOFF L1Accepts (L1ids) TTC2LAN Data LTP Trigger Source TTC crate (e.g., Detector NIM signal, Central Trigger Processor, Random Internal to LTP, etc) Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Trigger EB by faking L2 (2) • Example: • LTP internal trigger with • L1Accepts: 100 kHz, • L2Accepts: 1 / 15 • 12 ROSs: • event size ~117 kB • EB rate (10 SFIs): • ~60 MB/s per SFI When EB is not able to catch up, we sent XON/XOFF from the DFM to the TTC2LAN: works fine ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB Commissioning: Partitioning OK • Use 2 TTC crates. • Two partitions, each triggered from different LTP, work OK. Event Builder (SFIs-1) Event Builder (SFIs-2) Network switches DFM-1 DFM-2 LTP-1 TTC2LAN LTP-2 TTC2LAN Read-Out Subsystems (ROSs-1) Read-Out Subsystems (ROSs-2) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Data from Detectors w/out final EB Dec. ’06; ATLAS Phase1 commissioning run (M1) Event Building was done at the ROS level A sub-detector TTC crate Trigger Signals LTP(s), TTCvi(s) Front End, RODs LTP CTP ROD Data Data Read-Out Subsystems (ROSs) (L1id) (L1id) Note: Central Trigger Processor (CTP) has all data internally, sends them to its’ ROS ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB integration: Data from Detectors (1) 28 Feb-12 Mar. ’07; ATLAS Phase2 commissioning run (M2) Local Disks & CASTOR Data • SFI data at CERN central storage (Castor). • Data Integrity OK • Used TCP for SFI-ROS & everywhere (big ROS fragment sizes): • Took calibration events from EM calorimeter as big as 10 MB • (Event Fragments up to 0.5 MB !) Event Builder (SFIs) Event Assignments DFM Data Requests L2Decisions XOFF TTC crate L1Accepts (L1ids) TTC2LAN Trigger Signals ECR Data LTP(s), TTCvi(s), FE, RODs LTP CTP ECR ROD Data Data (L1id) Read-Out Subsystems (ROSs) (L1id) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB integration: Data from Detectors (2) Local Disks & CASTOR Cosmic Muon Triggers from the muon trigger chambers and the Tile hadronic calorimeter • Used ~ 50 ROS, event size ~1 MB • Central Trigger Processor • Muon CTPInterface • EM calorimeter • HAD calorimeter • Muon Tracking Chambers • Muon Trigger Chambers Data Event Builder (SFIs) Event Assignments DFM Data Requests L2Decisions XOFF MuCTPI TTC crate L1Accepts (L1ids) TTC2LAN Trigger Signals ECR Data LTP(s), TTCvi(s), FE, RODs LTP CTP ECR Trigger Signals LTP(s), TTCvi(s), FE, RODs Read-Out Subsystems (ROSs) LTP LTP(s), TTCvi(s), FE, RODs LTP ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB integration: Assembling Data and serving monitors Event Builder provides events for Event Display: Events sampled at SFI given to Processing Tasks running Event Filter Calorimeter Algorithms ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning & integration: summary (1) • Commissioning and Integration of the first 1/3 of the final Event Building was successful with detectors taking cosmic muon events • Functionality & Performance as expected (e.g., Aggregate Event Building capacity scales with EB farm size) • Full ReadOut System installed and commissioned • 50% of final EB needs will be available by end of the year • Problems/bugs discovered and solved: there is nothing more useful than running in the real conditions ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning & integration: summary (2) Thank you There is nothing more useful than running in the realistic conditions • Not only for debugging, but also for building teams! ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Extras ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
EB commissioning: Performance summary Local Disk CASTOR System parameter scanning (42 ROS, 29 SFIs, mixed network protocols), nominal event size 453 kB • Perfect scaling of EB rate with # of SFIs: (ROS Fragment size 12k, SFI.TrafficShaping 10)#SFIs EB Rate 29 7.3 kHz (~114 MB/s per SFI) 20 5.0 kHz (~114 MB/s per SFI) 10 2.5 kHz (~114 MB/s per SFI) • EB rate vs. Event size scales: (SFI.TrafficShaping 10, 29 SFIs)ROB Fragment size EB Rate 0.5 11.0 kHz (~ 86 MB/s per SFI) 1 7.3 kHz (~114 MB/s per SFI) 2 3.5 kHz (~114 MB/s per SFI) • SFI.TrafficShaping controls EB rate: (ROS Fragment size 12k, 29 SFIs)SFI.TrafficShaping EB Rate 14 7.2 kHz (~113 MB/s per SFI) 10 7.3 kHz (~114 MB/s per SFI) 6 6.2 kHz (~ 97 MB/s per SFI) Event Builder (SFIs) Network switches DFM Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Linearity of Event Building @ TestBeds Event Builder (SFIs) Each SFI adds 1 Gbit/s, till #SFIs = #ROSs Then, EB rate reaches plateau: Input-limited Note: Gbit load controlled with SFI Traffic Shaping Credits Network switches DFM Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Trigger EB by faking L2 Local Disks CASTOR Data Event Builder (SFIs) Event Assignments DFM Data Requests L2 Decisions L1Accept Message via LAN Data Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
For HLT, CPU power is important Test with AMD dual-core, dual CPU @ 1.8 GHz, 4 GB total • At TDR we assumed: • 100 kHz LVL1 accept rate • 500 dual-CPU PCs for LVL2 • each CPU has to do 100Hz • 10ms average latency per event in each CPU Assumed: 8 GHz per CPU at LVL2 Preloaded ROS w/ muon events, run muFast @ LVL2 8 GHz per CPU will not come (soon) But, dual-core dual-CPU PCs show scaling. We should reach necessary performance per PC (the more we wait, the better machines we’ll get) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
ATLAS Trigger & DAQ: concept p p Full info / event: ~ 1.6 MB/25ns = 60 PB/s 40 MHz LVL1 • Hardware based • No dead time 100 kHz 160 GB/s LVL2 • Algorithms on PC farms • seeded by previous level • decide fast • work w/ min. data volume ~3+6 GB/s ~ 3.5 kHz Event Filter ~ 300 MB/s ~ 200 Hz ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
(Details: Data Flow and Message Passing) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
SFI: multiThreaded C++ application ROS & pROS Trigger (Event Filter) Event Fragments Data Requests Event Input Activity Request Activity Assignment Event Handler Activity SFI: Event Builder *Event Fragments *Event Event Assembly Activity Event Sampler Activity Event Assignments Event Event Monitoring Data Flow Manager ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas
Focus: Event Building (EB) in ATLAS site Gigabit Ethernet Requested event data Event data pulled: full events @ ~ 3.5 kHz Event data requests Delete commands Event Builder (SFIs) Network switches Data Flow Manager Read-Out Subsystems (ROSs) ATLAS Event Builder; Design & Performance of final 1/3 - K. Kordas