200 likes | 323 Views
“More reliable than an airline*”. * Or the GRID. DØ Level 3 Trigger/DAQ System Status. G. Watts (for the DØ L3/DAQ Group). Overview of DØ Trigger/DAQ. Standard HEP Tiered Trigger System. FW + SW. Firmware. Level 1. Level 2. 1.7 MHz. 2 kHz. Commodity. Commodity. Online System. DAQ.
E N D
“More reliable than an airline*” * Or the GRID DØ Level 3 Trigger/DAQ System Status G. Watts (for the DØ L3/DAQ Group)
Overview of DØ Trigger/DAQ Standard HEP Tiered Trigger System FW + SW Firmware Level 1 Level 2 1.7 MHz 2 kHz Commodity Commodity Online System DAQ L3 Trigger Farm 1 kHz 300MB/sec 100 Hz 30 MB/sec • Full Detector Readout After Level 2 Accept • Single Node in L3 Farm makes the L3 Trigger Decision • Standard Scatter/Gather Architecture • Event size is now about 300 KB/event. • First full detector readout • L1 and L2 use some fast-outs G. Watts (UW)
Overview Of Performance System has been fully operational since March 2002. Tevatron Increases Luminosity Physicists Get Clever • Trigger software written by large collection of non-realtime programmer physicists. • CPU time/event has more than tripled. • Continuous upgrades since operation started • Have added about 10 new crates • Started with 90 nodes, now have almost 250, none of them original • Single core at start, latest purchase is dual 4-core. • No major unplanned outages # of Multiple Interactions Increase Trigger List Changes Increased Noise Hits More CPU Time Per Event Rejection Moves to L1/L2 Increased Data Size Trigger HW Improvements An Overwhelming Success G. Watts (UW)
24/7 Over order of magnitude increase in peak luminosity Constant pressure: L3 deadtime shows up in this gap! G. Watts (UW)
Basic Operation Data Flow Directed, unidirectional flow Minimize copying of data Buffered at origin and at destination Per Event Control Flow 100% TCP/IP Bundle small messages to decrease network overhead Compress messages via configured lookup tables G. Watts (UW)
The DAQ/L3 Trigger End Points Read Out Crates are VME crates that receive data from the detector. ROC ROC ROC ROC ROC • Most data is digitized on the detector and sent to the Movable Counting House • Detector specific cards in the ROC • DAQ HW reads out the cards and makes the data format uniform Farm Node Farm Node Farm Nodes are located about 20m away (electrically isolated) Farm Node • Event is built in the Farm Node • There is no event builder • Level 3 Trigger Decision is rendered in the node. Between the two is a very large CISCO switch… G. Watts (UW)
Hardware • ROC’s contain a Single Board Computer to control the readout. • VMIC 7750’s, PIII, 933 MHz • 128 MB RAM • VME via a PCI Universe II chip • Dual 100 Mb ethernet • 4 have been upgraded to Gbethernet due to increased data size • Farm Nodes: 288 total, 2 and 4 cores per pizza box • AMD and Xeon’s of differing classes and speeds • Single 100 Mb Ethernet • Less than last CHEP! • CISCO 6590 switch • 16 Gb/s backplane • 9 module slots, all full • 8 port GB • 112 MB shared output buffer per 48 ports G. Watts (UW)
Data Flow • The Routing Master Coordinates All Data Flow • The RM is a SBC installed in a special VME crate interfaced to the DØ Trigger Framework • The TFW manages the L1 and L2 triggers • The RM receives an event number and trigger bit mask of the L2 triggers. • The TFW also tells the ROC’s to send that event’s data to the SBCs, where it is buffered. • The data is pushed to the SBC’s ROC Farm Node ROC Farm Node ROC Farm Node ROC Routing Master ROC L2 Accept DØ Trigger Framework G. Watts (UW)
Data Flow • The RM Assigns a Node • RM decides which Farm Node should process the event • Uses trigger mask from TFW • Uses run configuration information • Factors into account how busy a node is. This automatically takes into account the node’s processing ability. • 10 decisions are accumulated before being sent out • Reduce network traffic. ROC Farm Node ROC Farm Node ROC Farm Node ROC Routing Master ROC DØ Trigger Framework G. Watts (UW)
Data Flow ROC • The Data Moves • The SBC’s send all event fragments to their proper node • Once all event fragments have been received, the farm node will notify the RM (if it has room for more events). Farm Node ROC Farm Node ROC Farm Node ROC Routing Master ROC DØ Trigger Framework G. Watts (UW)
Configuration DØ Online Configuration Design Level 3 Supervisor Configuration Performance G. Watts (UW)
The Static Configuration • Farm Nodes • What read out crates to expect on every event • What is the trigger list programming • Where to send the accepted events and how to tag them • Routing Master • What read out creates to expect for a given set of trigger bits • What nodes are configured to handle particular trigger bits • Front End Crates (SBC’s) • List of all nodes that data can be sent to Much of this configuration information is cached in lookup tables for efficient communication at run time G. Watts (UW)
The DØ Control System DØ Shifter “COOR”dinate (master Run Control) Configuration Database Calorimeter Level 2 Level 3 Online Examines Standard Hierarchical Layout Farm Node Routing Master • Intent flows from the shifter down to the lowest levels of the system • Errors flow in reverse Farm Node Farm Node Farm Node Farm Node SBC SBC SBC SBC G. Watts (UW)
Level 3 Supervisor Input • COOR sends a state description down to Level 3 • Trigger List • Minimum # of nodes • Where to send the output data • What ROC’s should be used • L3 Component Failures, crashes, etc. • Supervisor updates its state and reconfigures is necessary. Level 3 Current State Super Output • Commands to sent to each L3 component to change the state of that component. Desired State State Configuration • Calculates the minimum number of steps to get from the current state to desired state. • Complete sequence calculated before first command issued • Takes ~9 seconds for this calculation. Commands to Effect the Change G. Watts (UW)
General Comments “COOR”dinate (master Run Control) Boundary Conditions • Dead time: beam in the machine and no data flowing • Run change contributes to downtime! • Current operating efficiency is 90-04% • Includes 3-4% trigger deadtime • That translates to less than 1-2minutes per day of downtime. • Any configuration change means a new run • Prescales for luminosity changes, for example. • A system fails and needs to be reset • Clearly, run transitions have to be very quick! Level 3 Farm Node Routing Master Farm Node Farm Node Farm Node Farm Node Push Responsibility Down Level SBC SBC Don’t Touch configuration unless it must be changed SBC SBC We didn’t start out this way! G. Watts (UW)
Some Timings Start Configure – 26 sec Start Run #nnnnn – 1 sec Configured Running Unconfigure – 4 secs Stop Run #nnnnn – 1 sec Pause 1 sec Resume 1 sec Paused G. Watts (UW)
Caching “COOR”dinate (master Run Control) COOR caches the complete state of the system Single point of failure for the whole system. But it never crashes! Recovery is about 45 minutes Can re-issue commands if L3 crashes Level 3 L3 Supervisor caches the desired state of the system (COOR’s view) and the actual state COOR is never aware of any difficulties unless L3 can’t take data due to a failure Some reconfigurations require minor interruption of data flow (1-2 seconds) Farm Node Routing Master Farm Node Farm Node Farm Node Farm nodes cache the trigger filter programming Farm Node If a filter process goes bad no one outside the node has to know Event that caused the problem is saved locally SBC SBC SBC SBC Fast Response to problems == less downtime G. Watts (UW)
Future & Conclusions G. Watts (UW)
Upgrades Farm Nodes • We continue to purchase farm nodes at a small increments as old nodes pass their 3-4 year lifetimes. • 8 core CPU’s run 8-9 parallel processes filtering event with no obvious degradation in performance. • Original plan called for 90 single processor nodes • “Much easier to purchase extra nodes than re-write the tracking software from scratch. SBCs Finally used up our cache of spares No capability upgrades required But we have not been able to make the new model SBC’s operate at the efficiency that is required by the highest occupancy crates. Other New Ideas • We have had lots of varying degrees of craziness. • Management very reluctant to make major changes at this point • Management has been very smart. G. Watts (UW)
Conclusion • This DØ DAQ/L3 Trigger has taken every single physics event for DØ since it started taking data in 2002. • 63 VME sources powered by Single Board Computers sending data to 328 off-the-shelf commodity CPUs. • Data flow architecture is push, and is crash and glitch resistant. • Has survived all the hardware, trigger, and luminosity upgrades smoothly • Upgraded farm size from 90 to 328 nodes with no major change in architecture. • Evolved control architecture means minimal deadtime incurred during normal operations • Primary responsibility is carried out by 3 people (who also work on physics analysis) • This works only because Fermi Computing Division takes care of the farm nodes... G. Watts (UW)