370 likes | 512 Views
Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment. Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva. Zurich (IBM Research Laboratory) 23 th January 2008. Outline. 1 CERN and the LHC 2 The CMS experiment
E N D
Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva Zurich (IBM Research Laboratory) 23th January 2008
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
CERN, European Organization for Nuclear Research Large Hadron Collider CERN provides research facilities to particle physicists worldwide
Large Hadron Collider (LHC) • Largest superconducting installation: • 27 Km ring • 3 billion euros • CMS and ATLAS detect collision information (event): • 40 million events/second
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
Compact Muon Solenoid (CMS) Numeric complexity: 21.6 m long 15 m diameter 12500 tones 4 Tesla solenoid (100.000 time earth mag. Field) 1200 m3/hour of water for cooling (~gva jet d’eau 1800) 10 MWatts required for operation (~10.000 houses) Human complexity: 39 countries 182 Institutes (CERN is 1) 3330 people ~800 students! Time complexity: Design stated 20 years ago! 7-8 years for construction 15 years of expected operational life time Already developing upgrades
The CMS “sensor” Particle physicist Electromagnetic Calorimeter: Measure energy of particles interacting electromagnetically Hadronic Calorimeter: Measure energy of particles interacting via the strong nuclear force (heavy neutral particles) ? Silicon Tracker: Find charged particle tracks and momentum Muon detector: Find muon tracks
The CMS Trigger and Data Acquisition System Solution based on two filter levels: Level-1 Trigger (HW) High Level Trigger (SW) 40 million events/second ~55 million Channels ~1 Mbyte per event We can just store 100 events per second Control system coordinates experiment operation
About this talk L1 Decision Loop and detector front-ends. HARDWARE CMS Control System. SOFTWARE
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
Context complexity • Human & political dimension: • Large number of independent research institutes with similar requirements using different technologies (e.g. FPGA vs ASIC, VME vs PCI vs tiny …) • Most people are particle physicist with few % of time dedicated to SW development. ~20% students Numeric dimension: Thousands of hardware modules and the same order of electronic links • Time dimension: • L1 Trigger development starts the year 2000 • L1 Trigger design for the SLHC has already started! • CERN Linux platform upgrades every 2 years • →Periodic Software & Hardware upgrades
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
XSEQ: A Software environment model Devices ControlSequence (XSEQ) XML DeviceDescription Interpreter Processes platform independent control sequences DeviceData • XML as uniform data representation format for both data and code • Long term technologic inversion (XML is here to stay) • Maximize usage of standard tools • Simplify software configuration management • 2. Interpreted approach for the code • Execute code independently of the platform
XSEQ example 1: Hello world XSEQ language (XML-based sequencer): ·Syntax specified in xsd documents + Extensions: file system, SOAP, DOM, HW access (PCI & VME). ·Exception handling with error recovery mechanism ·Other: object oriented and design by contract extensions. XSEQ syntax core definition Every tag is a function Exception handling
XSEQ example 2: hardware access Decoupling syntax and semantic enhances sharing code between sub-systems with similar requirements Extends interpreter in order to execute a new syntactic extension Common tools for processing code and data. Simplifies core development Scoped variableNot accessible in upper hierarchical levels Device specifications <PciAddressTable …> <Item name=“MMU" address=“04040404" …/> <Item name=“STAT" address="10100" …/> … </PciAddressTable> <?xml version="1.0" encoding="UTF-8"?> <xseq xmlns=“http://xdaq.cern.ch/xseq/basic” xmlns:hwa="http://xdaq.cern.ch/xseq/hwaext" …> <extend ns="http://xdaq.cern.ch/xseq/hwaext" url="http://xdaq.cern.ch/xseq" module=“halx86”//> <variable name="my_device“> <hwa:pcidevice> <url>http://xseq.cern.ch/register_table.xml</url> <hwa:busadapter>PCIi386BusAdapter</hwa:busadapter> <hwa:vendorid format="hex">ecd6</hwa:vendorid> <hwa:deviceid format="hex">fd05</hwa:deviceid> <hwa:index format="dec">0</hwa:index> </hwa:pcidevice> </variable> <out> <var>my_device</var> <hwa:item>CTRL</hwa:item> <var>my_data</var> </out> </xseq>
Online software integration Interpreter plug-in Peer transport (SOAP) XDAQ application XDAQ executive(one per host computer) Xseq program (URL) XDAQ framework: CMS in house developed C++ Middleware SOAP message specifies the URL of the XSEQ program or embeds a the program itself The running XSEQ program can access the original SOAP message and retrieve parameters Return message generated by the XSEQ program
XSEQ example 3: distributed system Client: Remote server: CERN. Geneve HEPHY. Vienna. Global Trigger server SOAP pt SOAP message sent by Client SOAP Xdaq executive Standalone Interpreter Interpreter plug-in <xseq … rpc_msg="msg"> <variable name="soapPart"> <xoap:getSOAPPart> <var>msg</var> </xoap:getSOAPPart> </variable> … <gt:configure> <var>board</var> <var>fname</var> <var>chip</var> </gt:configure> … <return> <var>my_msg</var> </return> </xseq> <xseq …> … <secure> <while> … <switch> <case value=“1"> <call url="http://…script1.xml"/> </case> … </switch> </while> <rescue> … <retry/> </rescue> </secure> </xseq> Bus protocol SOAP exension SOAP messae returned to the client <VmeAddressTable …> <Item name=“CTRL" address=“10000" …/> <Item name=“STAT" address="10100" …/> … </VmeAddressTable>
XSEQ conclusions “Good”: Suitable technologic investment (XML is here to stay) Reduces in house development (Large asset of standard tools) Enhances code sharing among sub-systems (extension mechanism) Enhances platform evolution (interpreted approach) Simplifies software configuration management (uniform usage of XML for code/data) “Bad”: • XML is verbose (programming with XSEQ is not fun), but: • An XML editor could help • XSEQ could serve as the underlying syntax to store virtual instrumentation developed with graphical tools like Labview • Just a prototype. It is not being used for production
Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor
CMS Trigger Supervisor Context CMS Control System. SOFTWARE L1 Decision Loop. HARDWARE ~55 Million Channels, ~1 Mbyte per event
HW context: L1-Trigger Decision Loop • Configuration: • 64 crates • O(103) boards • Firmware ~ 15 MB/board • O(102) regs/board • Testing: • O(103) links L1 decision loop operation ~ “business” • Integration coordination: • 27 research institutes • Time: • Research: 1992-2000 • Development: 2000-present • Fully replaced by 2010
SW context: Experiment control system L1-Trigger Control and Hardware Monitoring System: Provides a machine and a human interfaces to operate, test and monitor the Level-1 decision loop hardware components. • Run Control and Monitoring System (RCMS): • Overall experiment control and monitoring • RCMS framework implemented with java (8) Experiment control system ~ “business” IT infrastructure • Cross-platform Data AcQuisition middleware (XDAQ): • C++ component based distributed programming framework • Used to implement the distributed event builder • Detector Control System (DCS): • Detector safety, gas and fluid control, cooling system, rack and crate control, high and low voltage control, and detector calibration. • DCS is implemented with PVSSII
Project phases and terminology New “business capabilities”: e.g. configuration Services Architecture Prove of concept System Framework Services and core developments Prototype • Business needs • Project team Concept SW Context HW Context “Business” software infrastructure “Busines”: To filter the “best” events
Business needs and project team Trigger Supervisor GUI Experiment control system Business need: coordinate operation of CMS subsystems (eg. Configuration and test) • TS team (2 + 1 or 2 students) : • Services + core developments • Architecture • Business capabilities • Sub-system developers coordination & support • Communication 0..n 0..n 0..n 1 1 • 1 developer per subsystem: • Uses services to develop the subsystem architecture • Customizes subsystem architecture as required by TS team Trigger Supervisor 1 pattern comp. 1 1 G. Cal. Trigger 1 1 R. Cal. Trigger DT TF 1 1 1 HCAL energy CSC TF 1 1 1 1 1 1 1 CSC hits DT hits RPC hits ECAL energy G. Muon Trigger HF energy GT/TCS
Baseline service infrastructure CMS official software frameworks to develop distributed systems: DCS, RCMS, XDAQ: Subsystems Online SoftWare Infrastructure needs to be integrated Infrastructure should be oriented to develop SCADA systems XDAQ-based baseline solution + additional development to reach SCADA framework
Core development: The Cell HTTP/CGI:Automatically generated Control panel plug-ins + e.g. DTTF panel e.g. GT panel e.g. Cell FSM operation Synchronous and Asynchronous SOAP API • Xhannel infrastructure: • Designed to simplify access to web services (SOAP and HTTP/CGI) from operation transition methods • Tstore (DB) • Monitor collector • Cells FSM Plug-ins Cell plug-ins (FSM, commands, control panels) hide HW and SW platform evolution • Other plug-ins: • Command: RPC method. SOAP API extensions • Monitoring items
Service providers: building blocks Tstore: DB interface. Exposes SOAP. 1 per system. Monitor sensor: Cell interface to poll monitoring information. 1 per cell. Mon. Collector: Polls all cell sensors. 1 per system. Mstore: interface M. collector with Tstore. 1 per system. XS: Reads logging data base. 1 per cell. Job control: Remote startup of XDAQ applications. 1 per host. RCMS components Cell: Facilitates subsystem integration and operation (additional development, next slide). 1 per crate. • Log Collector: • 1 per system. • Collects log statements from cells and forward them to consumers. Architecture based uniquely on these components
Architecture Building blocks Monitoring system Control system + = • User’s guide • Workshops • Support + Logging system Start-up system Subsystem Usage model proposal
Control architecture Stable infrastructure in top of what new “business” capabilities can be defined Hierarchical control system 1 crate ~ 1 cell Centralized access to DBs Multicrate subsystems ~ 2 level of subsystem cells (1 subsystem central cell)
Monitoring architecture Infrastructure that facilitates the hardware monitoring Centralized access to DBs Centralized system: 1 Collector, 1 Mstore 1 cell ~ 1 sensor
Logging and start-up architecture Auxiliar infrastructure 1 cell ~ 1 XS Centralized system: 1 Collector 1 host ~ 1 JC
New business capabilities: “How to” e12() e23() e34() S4 S1 S2 S3 New “business” capabilities can be coordinated by particle physicist managers without SW expertise e43() Particle physicist manager Subsystem SW developer e12() e23() e12() e23() S1 S2 S3 S1 S2 S3 Entry cell Operation states Operation transitions Operation transition methods Service test
Trigger Supervisor conclusions Design: Services , architecture and “business” capabilities 1 Services: • Reduced number of building blocks already developed in-house (but the Cell) • Main building block: Cell • Isolates Hardware/Software evolution from architecture implementation • Adapts sub-system integration tasks to the human context academic background (Non SW experts) 2 Architecture: • Uniquely based on 7 building blocks • Simplifies sub-system integration coordination • Stable infrastructure • Isolates services evolution from the implementation of business capabilities 3 New “business” capabilities: • Coordination methodology associated with the architecture • Facilitates the implementation of new “business capabilities” taking into account the academic background of managers (Non SW experts)
Summary • Enhancing control systems design & development with web-services technologies: • XML-based programming language: • Maximizes usage of existing XML standards and tools, good tech. investment, max. code sharing and platf. evolution • Control system design example: • Services: Hides HW/SW evolution • Architecture: Hides Services evolution, stable infrastructure • Business capabilities: Developed in top of the architecture
Thank you very much! … For more information: Ildefons.magrans@cern.ch http://triggersupervisor.cern.ch