200 likes | 349 Views
Process and Data Flow Control in KLOE. E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it. Outline. System overview Process structure and local communication SNMP and remote communication Process control Data Flow Control system DFC monitor. C P U. F D D I. V I C. R
E N D
Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) enrico.pasqualucci@roma1.infn.it
Outline • System overview • Process structure and local communication • SNMP and remote communication • Process control • Data Flow Control system • DFC monitor
C P U F D D I V I C R O C K M C P U V I C C P U V I C V I C V I C A U X M A U X M A U X M F D D I F D D I A U X M A D C 16 A D C 16 A D C 16 A D C 16 V I C V I C ... V I C ... V I C ... ... A D C 1 A D C 1 A D C 1 A D C 1 R O C K M R O C K R O C K R O C K R O C K R O C K M DAQ system architecture ~ 23000 FEE channels @ 2.5 kHz f + bckg (~10 kHz) Bandwidth:~ 50Mbytes/s (5 Kbyte/ev.) Storage:200 Tbyte/y VIC CBUS Tested with peak rates of 10 kHz in multibunches mode. Tested at maximum required throughput using no zero suppressed calorimeter data Trigger chain DFC system VIC C P U F D D I V I C Level-2 crates . . . FDDI Run Control FDDI Switch Monitor System . . . CPU server CPU server Storage system
Level 1 chain DFC system VME Chain tools Collector simulation GeoVme map CmdSrv Level2 Monitor system Circ SpyBuff Sender dmap RSpyD Didone FDDI switch Receiver RunCtl SpyD Circ Farm Builder Recorder CmdSrv Circ (Ybos) To Disk/Tape Spy dump Farm status DAQ software organization Data Map data Messages Traps SlowCtl system
Initialization Msg Q creation Shmem subscription Shmem space allocation for variables Main Loop Process Event Process Command Idle time Interrupt Handler Extract command from Msg Q. Process structure Id Contents Mapping Process number Pointer to 1st process Pointer to 2nd process Process name Process id Message queue id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Header Proc. 1 Proc. 2 All Processes
Process number Pointer to 1st process Pointer to 2nd process My process Process id Message queue id Process status Last command Last command status Number of variables Variable 1 My variable = value ….. Pointer to 3rd process ….. Process number Pointer to 1st process Pointer to 2nd process My process My process id My Q id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Id Contents Mapping Process number Pointer to 1st process Pointer to 2nd process My process My process id My Q id Process status Stop ! Executing Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Process number Pointer to 1st process Pointer to 2nd process My process My process id My Q id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Process number Pointer to 1st process Pointer to 2nd process My process My process id My Q id Process status Stop ! Success Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Header Proc. 1 Proc. 2 Process number Pointer to 1st process Pointer to 2nd process My process Process id Message queue id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. All Processes Process number Pointer to 1st process Pointer to 2nd process My process My process id My Q id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. My process signal Stop ! Stop ! Local communication • The receiver: • Polls on command status • Puts command to Q • Sends an interrupt • Writes the command status (acknowledgement) • Locates the process • Gets its id and message Q • Sending a command: • The sender: • Writes the command and status and executes it • Locate variable • Locate process • Getting a variable: • Reads the Q Process number Pointer to 1st process Pointer to 2nd process Process name Process id Message queue id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Process number Pointer to 1st process Pointer to 2nd process Process name Process id Message queue id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Pointer to 3rd process ….. Q
Managing the DAQ network • SNMP (Simple Network Management Protocol) • Largely used to manage network devices • Defined as a standard by the IETP (Internet Engineering Task Force) • Implemented using a reliable UDP protocol • Used to retrieve and/or set information about : • network configuration • traffic • faults • accounting • Managed objects defined in a Manager Information Base (MIB) defined by IETP • Private extensions of the standard MIB are allowed • Public domain software, allows the implementation of : • dedicated agents • utilities for remote access
SNMP client-server policy • MIB • Variables organized as a tree • Primitives: • get, get-next, set • Each device runs a daemon able to: • Understand MIB requests • Obtain required information • Execute required actions • Trap mechanism • KLOE uses SNMP to: • Control DAQ devices and network • Implement message distribution • Implement process control • Implement Data Flow Control (DFC)
iso.org.dod.internet.mgmt.mib-2 system(1)KLOE(13) …. kprocesses(1) kprocNumber(1) kprocVarTable(3) kprocVarEntry(1) sysServices(7) sysDescr(1) kprocTable(2) kprocEntry(1) sysLocation(6) sysObjectID(2) sysName(5) sysUpTime(3) kprocVarValue(n,6) kprocVarProcIndex(n,1) sysContact(4) kProcVarIndex(n,2) kprocVarType(n,5) kprocIndex(1) kprocVarNumber(8) kprocVarName(n,3) kprocName(2) kprocVarSize(n,4) kprocLastCommandStatus(7) kprocId(3) kprocLastCommand(6) kprocMsgQId(4) kprocStatus(5) The command server andthe KLOE MIB sub-tree
Node B Node A locate process send command Run Control SNMP ack first ack req first ack put command second ack req INT Command Server second ack get process variables write last command and status executing get command Msg Q execute command Shared Memory write command status (success, fault) DAQ Process Message system implementation
Remarks and performance • Command server • DAQ process • receives commands and shares variables • Command distributor • Run and process control tools • tcl/tk commands implemented • get variable, send message • Fortran interface for old fashioned software • Portable • AIX, OSF1, HP-UX, Solaris, Linux, LynxOS supported • Optimized library • Parallel message distribution implemented • Performance • Local command ~1.2 ms • Remote variable reading ~1.2 ms • Remote command completion ~4 ms
Shmem (variables) Production process control command command + start trap signal check pcd Control node OffCtl cmdsrv locpc Production node Proc_1 Proc_2
C P U F D D I V I C R O C K M C P U V I C V I C C P U V I C V I C A U X M A U X M A U X M F D D I A U X M F D D I V I C A D C 16 V I C A D C 16 A D C 16 A D C 16 V I C ... ... V I C ... ... A D C 1 A D C 1 A D C 1 A D C 1 R O C K R O C K M R O C K R O C K M R O C K R O C K DAQ system architecture VIC CBUS Trigger chain DFC system VIC C P U F D D I V I C Level-2 crates . . . FDDI Run Control FDDI Switch Monitor System . . . CPU server CPU server Storage system
VIC bus DFC status TS shmem Flow table DFCd DFC Flow table latmon Collector Receiver RunCtl The DFC System • Changes the packet distribution sequence • Avoids slow-down in data transmission and blocking timeouts • Keeps latency under control Network and trigger stat Performance stat Statistics Commands Traps Flow table data
. . . . . . 0.5 MB/s TCP/IP on FDDI 0.5 MB/s 0.5 MB/s Select and copy sub-event packets If last # arrived Send LatMon trap (#) To LatMon Get max occupancy If “full” If “empty” after “full” EVB (1) Send trap “full” EVB (n) Send trap “empty” To DFC system Receiver protocol • Receives event sub-packets through the GigaSwitch • Put packets into multiple circular buffer • Implements DFC and LatMon farm interface • Dynamic thresholds
Max number of tables N. of RECV nodes DFC map IP addresses Flags 111111…1111 Flags 111101…1111 Flow tables 0 . . . DFC Protocol DFC data in VME shared memory • Initialization: • Builds Network Map • Builds DFC map (ordered list of RECV IP addresses) • Creates the first table with Infinity Trigger number validity • Main Loop: • Wait for “trap” • On trap (full/empty): • Reads the last trigger number from Trigger Supervisor • Creates next table • Modifies the validity of the previous table • Sends auto-test traps 0 0 Validity trigger
DFC algorithm and performance • Validity: • v = t0 + (ttr + (tdfc + ksdfc))*(n + ksn) + t • k = 5 • autotest • DFCd reaction time (trap): • 1.2 ms • DFC reaction time: • tlocal ~ 1.2 ms • trigger interaction ~6-7 ms • tdfc ~ O(10-2) ms • total 10 ms • DFC-L2 interaction rate: • ~ 1 table / 50 ms (sustained) • DFC “dead time” implemented
Packet latency • Latency measurements: • SNMP traps sent to LatMon: • Collector trap when the packet # is released for sender • Receiver trap when all the sub-packets # arrived • Test for receiver’s buffers
Summary • A fast and reliable message system has been implemented using standard UNIX mechanisms and the SNMP protocol • Very simple to use • process template + command definition • fortran and tcl/tk interface • Allows full process control • A Data Flow Control system has been developed using message system and SNMP traps • It allows to redirect network traffic taking into account the dynamics of the whole system • Dynamic redefinition of thresholds • It successfully ran during KLOE data acquisition