510 likes | 706 Views
Control and Monitoring of Front-end and Readout Electronics in ALICE. Peter Chochula. Outline. Front-End and ReadOut electronics (FERO) access strategy will be described using SPD as an example Live demonstration of a prototype solution Discussion of related problems
E N D
Control and Monitoring of Front-end and Readout Electronics in ALICE Peter Chochula
Outline • Front-End and ReadOut electronics (FERO) access strategy will be described using SPD as an example • Live demonstration of a prototype solution • Discussion of related problems This is not a first presentation on FERO problematics. Proposed solution should now be adopted as ALICE standard – your feedback is therefore essential. Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Basic Interaction Between FERO and DCS • FERO controls ranges from downloading of parameters to high-level tasks such as calibration • Monitoring includes direct reading of parameters provided by the electronics or indirect surveillance using associated equipment (power supplies etc.) • Corrective actions are expected from the control system in case of anomalous values (errors, excessive readings etc.) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
The Front-end Device (FED) • The FED makes implementation details transparent to higher software layers • FED encapsulates hardware and software into a unit • Accepting standard and detector specific commands • Publishing standard and detector specific data Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Class B Class C Class D Class A • alternative technology such as Profibus, Ethernet, Easynet etc. is used to configure FERO • FERO configuration can be performed via DDL and optionally using alternative techno-logy (Profibus, Ethernet, etc.) • FERO configuration is implemented via DDL • Monitoring is based on different technologies • Configuration and Monitoring are sharing the same access path to FERO Configuration Configuration Configuration Monitoring Monitoring Monitoring Configuration Monitoring Non-DDL technology Non-DDL technology DDL DDL FERO FERO FERO FERO FERO Control and Monitoring Strategies in ALICE • Four different FERO architecture classes should be transparent to upper software layers • FERO should be treated as any other device: • accept commands • publish status and gathered data Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
The Front-end Device (FED) DAQ/RC DCS Class A+B Control FED Client DDL SW Monitoring of all classes FED Communication based on DIM DDL SW FED Server Control CPU Control CPU Profibus, JTAG, etc. DDL FERO Hardware Layer Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
The Front-end Device (FED) Class B,C,D Control DAQ/RC DCS Class A+B Control FED Client DDL SW Monitoring of all classes FED DDL SW FED Server Control CPU Control CPU Profibus, JTAG, etc. DDL FERO Hardware Layer Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
The Front-end Device (FED) ECS Class B,C,D Control DAQ/RC DCS Class A+B Control FED Client DDL SW Monitoring of all classes FED DDL SW FED Server Control CPU Control CPU Profibus, JTAG, etc. DDL FERO Hardware Layer Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
SPD Readout Architecture DCS DAQ VME Router Card PCI-MXI-II-VME JTAG, CLK, Detector Data Trig ~100m 1 router services 6 halfstaves SPD contains 20 routers Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
MCM ALICE SPD as an example for FED Implementation Basic SPD block seen by online systems is a half stave Laser and Pin Diodes • GOL - Translates data into G-Link compatible 800Mbit/s stream and drives the optical laser component Digital Pilot –receives trigger signals, controls chips, reads analog pilot Analog Pilot – provides reference voltages, ADCs for monitoring ALICE1/LHCB Chip – reads signals from bump-bonded pixel sensors Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
SPD Operation Modes • Configuration Mode: • JTAG Bus is used to configure ALICE1 chips • Monitoring of MCM is suspended JTAG • Operation Mode: • ALICE1 chips are taking data • JTAG Bus is reconfigured and services only the MCM • MCM is monitored via JTAG Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Monitoring of ALICE SPD Monitoring Commands Data (Temp,I,V,flags) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Configuration of ALICE SPD New Configuration Data Current Configuration Data New Configuration Data CONFIGURING ! Old Data Output Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Implementation of SPD FERO Control • Application called “SPD FED Server “ written in C++ (WXP) • Various tasks implemented as threads • Configuration • Monitoring • Readout tests, etc… • Operation of threads is synchronized by the main application • The threads are called AGENTS Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
SPD Agents SPD FED SERVER Monitoring Agent (MA) Control Agent (CA) During the execution of control task the operation of MA must be suspended Monitoring Agent continuously reads the MCM parameters The MA is allowed to resume its operation Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Architecture of SPD Control and Monitoring Software Client Software (PVSS) DIM Client Commands Data Monitoring Agents (MA) are implemented as separate threads DIM server FED Server CA1 Database CA2 MA1 MA1 VISA PCI-MXI-VME Control Agents (CA) react to commands received by the server Monitoring Agents (MA) publish data as DIM services Router SPD Halfstave MCM Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Generic architecture of the FED software DIM Interface allows for communication with higher levels of software Client Software (PVSS) DIM Client Commands Data DIM server FED Server Application layer contains detector control and monitoring code (agents) CA1 Database CA2 MA1 MA1 Device driver(s) Hardware Hardware access layer contains device drivers Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
SPD FED Server • SPD FED Server is one and only access point to the hardware • FED server accepts commands and executes them • It arbitrates access to the hardware • Synchronizes internal execution of agents • Main (DCS) role of the server is monitoring of sensitive detector parameters • Acquired data is published as service Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
FED Server commands and services • The communication can be divided into standard and sub-detector specific parts. • Standard commands include setup and monitoring of the FED servers itself, initialization and actions common for all architectures • SPD specific part includes procedures used only by SPD Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Recognized commands General: Initialize Re-initialize Calibrate Modify Running parameters Publish Active Settings SPD Specific Test Readout Test JTAG Test SEU Start/Stop Agents DIM Interface of the SPD C/M Server Published services • General: • Server Status • Agent Status • Messages • Detector data: • Temp • I, V • Status and error flags Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
FED Server operation • The server needs to be first configured (there are standard built-in settings). This includes: • Setting of logging method • Setting of readout frequencies for the agents • Setting of deadbands for the agents • On request the agents start their operation. States of individual agents are visible to all server components and are used for internal synchronization • Monitoring agents typically read data in predefined intervals (auto-triggering). External triggers are possible Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
SPD Agent States • The agent accesses the hardware and performs its task (FERO configuration, DCS data readout, etc.) • The agent fully relies on main thread – during its execution all potentially interfering agents must be suspended. Status of other threads is signaled to individual agents. • Execution of complex agents can be interrupted by external command (e.g. calibration run can be stopped and FERO can be reloaded) Executing Execute (or auto-trigger) Idle/Alive • The agent is ready and waits for command or auto-trigger • Idle agent sends heartbeat to the main thread • The agents are suspended at server’s startup time (to avoid interference e.g. during crash recovery when the state of bus is not known) • Agents can be suspended also on request (e.g. to disable auto-triggering for a short period) • On wakeup the agent automatically resumes its previous task Suspend Wakeup Suspended Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
FED Server Setup Server Operation Parameters Debugging Level Control Debugging Output Control Internal Agent Status Monitoring Settings • The Server Setup Panel allows tuning of server operation and debugging • sets monitoring limits and refresh rates • controls complexity of published messages • controls publishing of debugging informatio Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
FED Server Control • Server Status Info: • Operational • Initializing • Calibrating • Checking JTAG • ….. Server Commands • Agent Status Info: • Suspended • Executing • Idle • The Server Control Panel allows sending commands to the FED • On receipt of command C/M Server: • Suspends monitoring agents (if needed) • Performs requested task • Resumes agent operation Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Mesage Severity Time & Date Caller Detector Example of PVSSII Message Viewer (based on DIM service) Messenger Service and Message Viewer • Messenger Service provides information about server’s operation • Complexity of published messages can be remotely tuned • Message destinations: • DIM based viewers • Logfiles/screen • Windows Event Logger (ActiveX) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Detector Data Subscriber PVSSII is the main DCS operation tool PVSS client subscribes to data published by the C/M server Gathered data is treated according to DCS recipes Data in this panel is updated according to server settings (frequency,deadbands, etc..) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Automatic evaluation of communication status Server operational but no data published Connection to Server lost! OK Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Optional DIM clients • Although PVSSII is the main DCS operation tool, it is not required to have it running in order to monitor data provided by the C/M Server • Additional clients such as Custom C++ clients, DimTree or DID can connect to C/M server in parallel to PVSS client • Operation of custom clients presents a serious risk to the DCS and their use will be strictly forbidden during running. The usage of extra clients will be allowed only for monitoring purposes. Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Integration of FED with DCS • FED can be described in terms of state machines and integrated with DCS using SMI++ interface • The aim is to identify common operational aspects and set them as ALICE standard. • Each detector will use its additional specific states Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Modelling the FED as a FSM : standard part RUNNING Stop Run READY Re-Initialize Initializing Go STDBY STANDBY Initialize Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Modelling the FED as a FSM : sub-detector specific (the READY state is taken from precious example) Calibrating Checking SEU Check SEU Calibrate READY Check JTAG Check Readout Checking JTAG Checking R/O Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Applications of described approach outside the FERO Domain In principle the described concept can be used to access any equipment requiring remote supervision DIM DIM C/M C/M VISA VISA RS-232 RS-232 or Ethernet Stepper Motor Controller Non-standard Power Supply Mirrors adjustable by stepper motors Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Partial Conclusions • Described model allows for monitoring and control of non-standard devices in ALICE – not only FERO • The FERO is treated as a device. The developed software makes all hardware details transparent to higher software layers • We need your feedback in order to be able to standardize the commands Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
A working FED server is just the beginning of the story… • Implementation of FERO software is a very complicated task requiring a lot of expertise – however it is still the easier part • Tuning of the software will require much more time and experience • DCS team is happy to discuss and help at least to transfer the knowledge between different groups. (Maybe someone already solved a problem which you are dealing with…) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Configuration Errors Wrong Configuration Data Non-working chip Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
What is the source of erroneous data? • Wrong configuration record in database • Wrong configuration version retrieved from database • Data transmission errors • interference with DAQ, TRG, crosstalk, … • single event effects … • Software errors • Interference between monitoring and control • Interference between control tasks • Interference between several clients – PVSS should be the ONLY client allowed to send commands to FED Server Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Communication between online systems and sub-systems • There are many cases when online systems need to be synchronized – in fact this is probably the most complicated FERO related problem • Procedures depend on hardware architecture as well as on detector operational modes • Spotting of problems and implementation of correct procedures requires close collaboration between different specialists • The first step is to understand the operation of the sub-detector and then analyze consequences for each case Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
… just a few examples • Some operations require additional actions: • Power sequence might temporary lead to a very high power consumption – software interlock levels must be raised until the situation stabilizes (e.g. until FERO initialization). In addition a RESET must be sent to FERO and some dummy configuration loaded, so an action on LV triggers an action on FERO as well. • Additional action from other systems might be necessary: • (Re-)Initialization might require stopping of trigger (downloading of data sometimes introduces unwanted noise, running DAQ sequence might corrupt currently downloaded data) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
… just a few examples • Typical initialization sequence is not a single-step procedure and can be even split between several systems: • DCS must assure the power and cooling are present • DCS related circuitry will be loaded and checked • Only now the DAQ can proceed with configuration • Additional adjustments and checks performed by DCS might be needed Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
… few more examples • DCS can detect and correct problems which are not immediately visible to other online systems. • Problems in DCS sub-system can affect other online system • In some cases DAQ and TRG will be asked to perform additional actions On following slide we use a generic “CLASS A” front-end to demonstrate the problem Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Typical problem requiring synchronization between online systems VR Failure (e.g. due to SEU) ECS FERO lost its Configuration DCS TRG DAQ Recovery Action by DCS DCS informs the DAQ and TRG via ECS VR FERO DAQ reloads the FERO Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Propagation of hardware related problems • Problems can propagate between sub-systems and affect different parts of the detector: • E.g. recovery of a VR failure could lead to corruption of configuration in neighboring sector (spikes, etc.). • Such problems can remain hidden (e.g. change in threshold settings) and will be discovered only by offline • If we are aware of potential problems we could maybe use a simple procedure to correct them before they badly affect the data Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Things to be understood • In principle every single chip can be acessed individually. Allowing for such access would lead to complications for some architectures • Example: rather than creating a service for each individual temperature sensor we could create groups for sub-detector modules and publish all data together • This requires balancing between number of services and update rates (we should not update a service with 1000 values if only one value changes, but we also do not want to have 1000 individual services instead) • Partitioning of sub-detectors should be done very carefully (taking into account both control and monitoring aspects) Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
A few more things to be understood • What should be the structure of DIM commands? • Command structure should include the target and data to be transferred. • Should we define standards for payload or set weak rules? The segmentation and naming differs between sub-detectors, definition of common standard would not be intuitive and therefore will lead to errors. • It is essential to create a naming convention for each detector and understand its segmentation in terms of control, DAQ readout and monitoring. Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
Still some question marks • What is the amount of data to be exchanged between DCS and FERO and at what frequencies? • What is the structure of this data • Example: temperatures for given sector can be sent as an array containing values for each probe. Another approach would be to send data structure containing only values which have changed and decode this information in PVSSII Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
…yet another one • What should be the granularity of agents? • Should we set common deadbands for all values acquired by the individual agent? (e.g. update temperature readings if any of them changes by more that 0.5C) – PVSSII will of course provide another level of filtering of these values • Should be the deadbands set more precise? (e.g. allowing for masking of faulty sensor at the level of FED server) • All answers depend on individual architectures. Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva
How to proceed? • Working and stable FED server is the starting point. It can be developed in advance – using emulators in the hardware access layer • Once the hardware arrives, one could fully concentrate on operational aspects. New procedures will be discoverred with time and can be implemented in software • Communication with other sub-detectors is essential. DCS team is happy to assist you in this point. Peter Chochula – ALICE DCS DCS Workshop – March 2004, Geneva