1 / 19

First year experience with the ATLAS online monitoring framework

First year experience with the ATLAS online monitoring framework. Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration. CHEP 2009, March 23 rd -27 th Prague. Outline. ATLAS trigger and data acquisition system in a glance Online monitoring framework

hvasquez
Download Presentation

First year experience with the ATLAS online monitoring framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration CHEP 2009, March 23rd -27th Prague

  2. Outline • ATLAS trigger and data acquisition system in a glance • Online monitoring framework • Readiness of the online monitoring for runs with cosmic rays and first LHC beam • Conclusions

  3. ATLAS Trigger/DAQ Interaction rate ~1 GHz Bunch crossing rate 40 MHz • Coarse granularity data • Calorimeter and Muon based • Identifies Regions of Interest Muon Spectrometer Inner Detector Calorimeter Hardware based LVL1 Trigger<100 kHz TGC TileCal LAr MDT RPC Pixel SCT TRT CSC • Partial event reconstruction in Regions of Interest • Full granularity data • Trigger algorithms optimized for fast rejection High Level TriggersSoftware based Read Out Systems (ROSs) 150 PC LVL2 Trigger~3 kHz Event Builder (EB) ~100 PC • Full event reconstruction seeded by LVL2 • Trigger algorithms similar to offline Event Filter~200 Hz Data Storage 900 farm nodes1/3 of final system

  4. Data Flow: ROD/LVL1/HLT Event Analysis Frameworks Web Service Online monitoring framework Data Quality Analysis Framework Data Monitoring Archiving Tools Visualization Tools • Complexity and diversity in terms of monitoring needs of the sub-systems Operational data Information Service About 35 dedicated machines Event samples • Automatic checks of histogram and operational data • Visualize and save results • Produce visual alerts • Set of tools to visualize information aimed for the shifters • Automatic archiving of histograms • Monitoring data available remotely • Analyze events content and produce histograms • Analyze operational conditions of hardware and software detector elements, trigger and data acquisition systems.

  5. Data Flow: ROD/LVL1/HLT Event Analysis Frameworks Data Quality Monitoring Framework • Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration. • Input and Output classes can be provided as plug-ins. • Custom plug-ins are supported • About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc) • Custom algorithms are allowed • Writes DQ Results automatically to Conditions Database. Histograms Information Service Event samplers Histograms DQResults Data Quality Monitoring Framework Data Quality monitoring display Input Interface Output Interface Configuration Interface DQResults Configuration Conditions DB Configuration DB

  6. DQM Display • Summary panel shows overall color-coded DQ status produced by DQMF per sub-system • Run Control conditions • Log information • Details panel offers access to the detailed monitored information • Checked histograms and their references • Configuration information (algorithms, thresholds, etc.) • History tab displays time evolution of DQResults. • About 17 thousands histograms checked • Shifter attention focused on bad histograms

  7. DQM Display - layouts • DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views • Bad histograms spotted even faster • Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes. • from a existing configuration one can attach layouts and shapes • these layouts are created and displayed online the same way they will show in the DQM Display • experts can tune layout/shape parameters until they look as required

  8. Online Histogram Presenter • Main shifter tool for checking histograms manually • Supports hierarchy of tabs which contain predefined set of histograms • Reference histograms can be displayed as well • Sub-systems normally have several tabs with most important histograms which have to be watched out

  9. Trigger Presenter Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains information HLT farms status Reflect status of HLT sub-farms using DQMF color codes. Implemented as an OHP plug-in

  10. Histogram Archiving Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run) Reads histograms from IS with respect to the given configuration and saves them to Root files Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run states/transitions Histograms archived can be browsed as well by a dedicated tool Information Service Histograms Monitoring Data Archiving ROOT files Collection and Cache ZIP Archive Browser CDR

  11. Operational Monitoring Display Each process in the system publishes its status and running statistics into Information Service => O(1)M objects Reads IS information with respect to user configuration and display it as time series graphs, bar charts. Analyses distributions against thresholds Groups and highlights the information for the shifter • Is being mostly used for the HLT farms status: CPU, memory, events distribution

  12. Event Displays • Atlantis: • Java based 2D event display • VP1 : • 3D Event display running in offline reconstruction framework • Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up • Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.

  13. Remote access to the monitoring information Public - monitoring via Web Interface: Information is updated periodically No access restrictions Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access

  14. Monitoring at Point 1 (ATCN) Remote Monitoring (CERN GPN) Data Flow: LVL1/HLT Data Quality Analysis Framework Data Monitoring Archiving Tools Visualization Tools Event Analysis Frameworks Web Service Web Browser Information Service Web Monitoring Interface • Generic framework which is running at P1 and is publishing information periodically to the Web • The information which is published is provided by plug-ins: currently two • Run Status shows status and basic parameters for all active partitions at P1. • Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval.

  15. Monitoring at Point 1 (ATCN) Remote Monitoring (CERN GPN) Data Flow: LVL1/HLT Data Quality Analysis Framework Data Monitoring Archiving Tools Visualization Tools Visualization Tools Event Analysis Frameworks Web Service Web Browser Information Service Information Service Mirror Remote Monitoring via mirror partition • Almost all information from Information Service is replicated to the mirroring partition • The information is available in the mirror partition with the O(1) ms delay • Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: • Environment looks exactly like at P1 • All monitoring tool displays are available and work exactly as at P1 • The production system setup supports up to 24 concurrent remote users

  16. Performance achieved • Online Monitoring Infrastructure is in place and is functioning reliably: • More than 150 event monitoring tasks are started per run • Handles more than 4 millions histogram updates per minute • Almost 100 thousands histograms are saved at the end of a run (~200 MB) • Data Quality status are calculated online (about 10 thousands histograms checked/min.) and stored in database. • Several Atlantis event displays are always running in the ATLAS Control Room and Satellite Control Rooms showing events for several data streams • Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay) • Remote monitoring pilot system deployed successfully

  17. Conclusions • The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements. • The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start-up. • Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.

  18. Data Flow: ROD/LVL1/HLT OH (Online Histogranning) IS (Information Service) Gatherer EMON Event Monitoring Framework components Users have to provide: Configuration files Plugins (C++ code) Job Option files C P MonaIsa JO P TriP (Trigger Presenter) C OMD (Operational Monitoring Display) C DQMF(Data Quality Mon Framework) C GNAM OHP (Online Histogram Presenter) C P MDA (Mon Data Archiving) C Event Filter PT (Processing Task) JO WMI (Web Monitoring Interface) P Event Display (ATLANTIS, VP1)

More Related