1 / 36

Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment

Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment. Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva. Zurich (IBM Research Laboratory) 23 th January 2008. Outline. 1 CERN and the LHC 2 The CMS experiment

anila
Download Presentation

Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva Zurich (IBM Research Laboratory) 23th January 2008

  2. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  3. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  4. CERN, European Organization for Nuclear Research Large Hadron Collider CERN provides research facilities to particle physicists worldwide

  5. Large Hadron Collider (LHC) • Largest superconducting installation: • 27 Km ring • 3 billion euros • CMS and ATLAS detect collision information (event): • 40 million events/second

  6. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  7. Compact Muon Solenoid (CMS) Numeric complexity: 21.6 m long 15 m diameter 12500 tones 4 Tesla solenoid (100.000 time earth mag. Field) 1200 m3/hour of water for cooling (~gva jet d’eau 1800) 10 MWatts required for operation (~10.000 houses) Human complexity: 39 countries 182 Institutes (CERN is 1) 3330 people ~800 students! Time complexity: Design stated 20 years ago! 7-8 years for construction 15 years of expected operational life time Already developing upgrades

  8. The CMS “sensor” Particle physicist Electromagnetic Calorimeter: Measure energy of particles interacting electromagnetically Hadronic Calorimeter: Measure energy of particles interacting via the strong nuclear force (heavy neutral particles) ? Silicon Tracker: Find charged particle tracks and momentum Muon detector: Find muon tracks

  9. The CMS Trigger and Data Acquisition System Solution based on two filter levels: Level-1 Trigger (HW) High Level Trigger (SW) 40 million events/second ~55 million Channels ~1 Mbyte per event We can just store 100 events per second Control system coordinates experiment operation

  10. About this talk L1 Decision Loop and detector front-ends. HARDWARE CMS Control System. SOFTWARE

  11. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  12. Context complexity • Human & political dimension: • Large number of independent research institutes with similar requirements using different technologies (e.g. FPGA vs ASIC, VME vs PCI vs tiny …) • Most people are particle physicist with few % of time dedicated to SW development. ~20% students Numeric dimension: Thousands of hardware modules and the same order of electronic links • Time dimension: • L1 Trigger development starts the year 2000 • L1 Trigger design for the SLHC has already started! • CERN Linux platform upgrades every 2 years • →Periodic Software & Hardware upgrades

  13. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  14. XSEQ: A Software environment model Devices ControlSequence (XSEQ) XML DeviceDescription Interpreter Processes platform independent control sequences DeviceData • XML as uniform data representation format for both data and code • Long term technologic inversion (XML is here to stay) • Maximize usage of standard tools • Simplify software configuration management • 2. Interpreted approach for the code • Execute code independently of the platform

  15. XSEQ example 1: Hello world XSEQ language (XML-based sequencer): ·Syntax specified in xsd documents + Extensions: file system, SOAP, DOM, HW access (PCI & VME). ·Exception handling with error recovery mechanism ·Other: object oriented and design by contract extensions. XSEQ syntax core definition Every tag is a function Exception handling

  16. XSEQ example 2: hardware access Decoupling syntax and semantic enhances sharing code between sub-systems with similar requirements Extends interpreter in order to execute a new syntactic extension Common tools for processing code and data. Simplifies core development Scoped variableNot accessible in upper hierarchical levels Device specifications <PciAddressTable …> <Item name=“MMU" address=“04040404" …/> <Item name=“STAT" address="10100" …/> … </PciAddressTable> <?xml version="1.0" encoding="UTF-8"?> <xseq xmlns=“http://xdaq.cern.ch/xseq/basic” xmlns:hwa="http://xdaq.cern.ch/xseq/hwaext" …> <extend ns="http://xdaq.cern.ch/xseq/hwaext" url="http://xdaq.cern.ch/xseq" module=“halx86”//> <variable name="my_device“> <hwa:pcidevice> <url>http://xseq.cern.ch/register_table.xml</url> <hwa:busadapter>PCIi386BusAdapter</hwa:busadapter> <hwa:vendorid format="hex">ecd6</hwa:vendorid> <hwa:deviceid format="hex">fd05</hwa:deviceid> <hwa:index format="dec">0</hwa:index> </hwa:pcidevice> </variable> <out> <var>my_device</var> <hwa:item>CTRL</hwa:item> <var>my_data</var> </out> </xseq>

  17. Online software integration Interpreter plug-in Peer transport (SOAP) XDAQ application XDAQ executive(one per host computer) Xseq program (URL) XDAQ framework: CMS in house developed C++ Middleware SOAP message specifies the URL of the XSEQ program or embeds a the program itself The running XSEQ program can access the original SOAP message and retrieve parameters Return message generated by the XSEQ program

  18. XSEQ example 3: distributed system Client: Remote server: CERN. Geneve HEPHY. Vienna. Global Trigger server SOAP pt SOAP message sent by Client SOAP Xdaq executive Standalone Interpreter Interpreter plug-in <xseq … rpc_msg="msg"> <variable name="soapPart"> <xoap:getSOAPPart> <var>msg</var> </xoap:getSOAPPart> </variable> … <gt:configure> <var>board</var> <var>fname</var> <var>chip</var> </gt:configure> … <return> <var>my_msg</var> </return> </xseq> <xseq …> … <secure> <while> … <switch> <case value=“1"> <call url="http://…script1.xml"/> </case> … </switch> </while> <rescue> … <retry/> </rescue> </secure> </xseq> Bus protocol SOAP exension SOAP messae returned to the client <VmeAddressTable …> <Item name=“CTRL" address=“10000" …/> <Item name=“STAT" address="10100" …/> … </VmeAddressTable>

  19. XSEQ conclusions “Good”: Suitable technologic investment (XML is here to stay) Reduces in house development (Large asset of standard tools) Enhances code sharing among sub-systems (extension mechanism) Enhances platform evolution (interpreted approach) Simplifies software configuration management (uniform usage of XML for code/data) “Bad”: • XML is verbose (programming with XSEQ is not fun), but: • An XML editor could help • XSEQ could serve as the underlying syntax to store virtual instrumentation developed with graphical tools like Labview • Just a prototype. It is not being used for production

  20. Outline 1 CERN and the LHC 2 The CMS experiment 3 Enhancing complexity management with web services 3.1 Software environment model: XSEQ 3.2 Concrete architecture: The CMS Trigger Supervisor

  21. CMS Trigger Supervisor Context CMS Control System. SOFTWARE L1 Decision Loop. HARDWARE ~55 Million Channels, ~1 Mbyte per event

  22. HW context: L1-Trigger Decision Loop • Configuration: • 64 crates • O(103) boards • Firmware ~ 15 MB/board • O(102) regs/board • Testing: • O(103) links L1 decision loop operation ~ “business” • Integration coordination: • 27 research institutes • Time: • Research: 1992-2000 • Development: 2000-present • Fully replaced by 2010

  23. SW context: Experiment control system L1-Trigger Control and Hardware Monitoring System: Provides a machine and a human interfaces to operate, test and monitor the Level-1 decision loop hardware components. • Run Control and Monitoring System (RCMS): • Overall experiment control and monitoring • RCMS framework implemented with java (8) Experiment control system ~ “business” IT infrastructure • Cross-platform Data AcQuisition middleware (XDAQ): • C++ component based distributed programming framework • Used to implement the distributed event builder • Detector Control System (DCS): • Detector safety, gas and fluid control, cooling system, rack and crate control, high and low voltage control, and detector calibration. • DCS is implemented with PVSSII

  24. Project phases and terminology New “business capabilities”: e.g. configuration Services Architecture Prove of concept System Framework Services and core developments Prototype • Business needs • Project team Concept SW Context HW Context “Business” software infrastructure “Busines”: To filter the “best” events

  25. Business needs and project team Trigger Supervisor GUI Experiment control system Business need: coordinate operation of CMS subsystems (eg. Configuration and test) • TS team (2 + 1 or 2 students) : • Services + core developments • Architecture • Business capabilities • Sub-system developers coordination & support • Communication 0..n 0..n 0..n 1 1 • 1 developer per subsystem: • Uses services to develop the subsystem architecture • Customizes subsystem architecture as required by TS team Trigger Supervisor 1 pattern comp. 1 1 G. Cal. Trigger 1 1 R. Cal. Trigger DT TF 1 1 1 HCAL energy CSC TF 1 1 1 1 1 1 1 CSC hits DT hits RPC hits ECAL energy G. Muon Trigger HF energy GT/TCS

  26. Baseline service infrastructure CMS official software frameworks to develop distributed systems: DCS, RCMS, XDAQ: Subsystems Online SoftWare Infrastructure needs to be integrated Infrastructure should be oriented to develop SCADA systems XDAQ-based baseline solution + additional development to reach SCADA framework

  27. Core development: The Cell HTTP/CGI:Automatically generated Control panel plug-ins + e.g. DTTF panel e.g. GT panel e.g. Cell FSM operation Synchronous and Asynchronous SOAP API • Xhannel infrastructure: • Designed to simplify access to web services (SOAP and HTTP/CGI) from operation transition methods • Tstore (DB) • Monitor collector • Cells FSM Plug-ins Cell plug-ins (FSM, commands, control panels) hide HW and SW platform evolution • Other plug-ins: • Command: RPC method. SOAP API extensions • Monitoring items

  28. Service providers: building blocks Tstore: DB interface. Exposes SOAP. 1 per system. Monitor sensor: Cell interface to poll monitoring information. 1 per cell. Mon. Collector: Polls all cell sensors. 1 per system. Mstore: interface M. collector with Tstore. 1 per system. XS: Reads logging data base. 1 per cell. Job control: Remote startup of XDAQ applications. 1 per host. RCMS components Cell: Facilitates subsystem integration and operation (additional development, next slide). 1 per crate. • Log Collector: • 1 per system. • Collects log statements from cells and forward them to consumers. Architecture based uniquely on these components

  29. Architecture Building blocks Monitoring system Control system + = • User’s guide • Workshops • Support + Logging system Start-up system Subsystem Usage model proposal

  30. Control architecture Stable infrastructure in top of what new “business” capabilities can be defined Hierarchical control system 1 crate ~ 1 cell Centralized access to DBs Multicrate subsystems ~ 2 level of subsystem cells (1 subsystem central cell)

  31. Monitoring architecture Infrastructure that facilitates the hardware monitoring Centralized access to DBs Centralized system: 1 Collector, 1 Mstore 1 cell ~ 1 sensor

  32. Logging and start-up architecture Auxiliar infrastructure 1 cell ~ 1 XS Centralized system: 1 Collector 1 host ~ 1 JC

  33. New business capabilities: “How to” e12() e23() e34() S4 S1 S2 S3 New “business” capabilities can be coordinated by particle physicist managers without SW expertise e43() Particle physicist manager Subsystem SW developer e12() e23() e12() e23() S1 S2 S3 S1 S2 S3 Entry cell Operation states Operation transitions Operation transition methods Service test

  34. Trigger Supervisor conclusions Design: Services , architecture and “business” capabilities 1 Services: • Reduced number of building blocks already developed in-house (but the Cell) • Main building block: Cell • Isolates Hardware/Software evolution from architecture implementation • Adapts sub-system integration tasks to the human context academic background (Non SW experts) 2 Architecture: • Uniquely based on 7 building blocks • Simplifies sub-system integration coordination • Stable infrastructure • Isolates services evolution from the implementation of business capabilities 3 New “business” capabilities: • Coordination methodology associated with the architecture • Facilitates the implementation of new “business capabilities” taking into account the academic background of managers (Non SW experts)

  35. Summary • Enhancing control systems design & development with web-services technologies: • XML-based programming language: • Maximizes usage of existing XML standards and tools, good tech. investment, max. code sharing and platf. evolution • Control system design example: • Services: Hides HW/SW evolution • Architecture: Hides Services evolution, stable infrastructure • Business capabilities: Developed in top of the architecture

  36. Thank you very much! … For more information: Ildefons.magrans@cern.ch http://triggersupervisor.cern.ch

More Related