240 likes | 254 Views
Workshop overview on integrating and running DQM applications globally, focusing on producers, collectors, and clients. Discusses challenges faced and improvements needed for efficient data analysis.
E N D
DQM (Local and FNAL ROC) & Event Display experience at MTCC DQM Workshop During February 2007 CMS Week Kaori Maeshima (FNAL) Ianna Osborne (Northeastern University) Ilaria Segoni (CERN)
Producer Producer Producer Producer Producer Collector Collector Collector Collector Collector Client Client Client Client Client Client Preliminary Remarks on DQM • Goals during MTCC: • Integrate and run for first time DQM applications that were normally used for individual detector runs (and mostly off-line) • Establish centralized DQM effort (global shifts, global shift remotely run, offline task force…) • Identify issues/problems • All groups have DQM module(s) in central DQM structure within CMSSW (ported or newly developed in last ~1.5 years) DQM Architecture: “DQM PRODUCERS”: (event) data analysis and production of the “monitor elements” (e.g. histograms). It is a CMSSW event analyzer module “DQM COLLECTORS”: source-client connection tasks. It’s a standalone xdaq application “DQM CLIENTS”: histogram level analysis, displaying of histograms. It is a standalone application (xdaq or iguana) DQM Workshop, CMS Week
Towards DQM Integration: Producer • Very easy by construction to build a global DQM producer: all existing modules can be included in a unique cmsRun (standard CMSSW application) • For MTCC two implementations for on-line DQM: • Online Implementation #1: DQM Producer running in Filter Farm, it included stably: • CTC (unpacking & DQM) • FED (unpaking & DQM) • DT (unpaking & DQM) • On occasion unpacking & DQM of other systems (e.g. CSC) DQM Workshop, CMS Week
Disk Storage EVT. DISPLAY DQM Producer Towards DQM Integration: Producer • Online Implementation #2:DQM Producer consuming events from Storage Manager (in parallel to Event Display) • AKA “Global DQM”, included all available & relevant to phase 2 DQM modules: • DT (UNPK+DQM) • RPC (UNPK+DQM) • CSC (UNPK+DQM) • HCAL (UNPK+DQM) • CTC (UNPK+DQM) • Fed (UNPK+DQM) • Physics Objects (UNPK+ Muon Reconstruction+DQM) • Not running on all events (as if on all FF Units), but nicely decoupled from DAQ could test harmlessly changes/upgrades upon requests from detector groups (e.g. upgrade to start using ORCON DB for HCAL, improved analysis after first tests on MTCC data for DT, RPC, optimed/faster analysis for HCAL, bug fixes) • In general for future test synchronization with official MTCC (or any future commissioning test project ) software should be used, we have enough data for debugging ahead now…. DQM Workshop, CMS Week
Collector Producer Collector Client Client Towards DQM Integration: Client • Two types of Client available: • Iguana client, local application, customizable layout of display • Xdaq client, has web interface, customizable analysis (e.g. quality tests) • Xdaq client was customized by groups but: • By construction clients cannot be integrated in single application • No detector-specific individual client is suitable for global DQM simple Iguana client used for central shifts DQM Architecture at MTCC Remote Operation Centre @ Fermilab Client DQM Workshop, CMS Week
Producer Collector Client Client Client Client Towards DQM Integration: Client • Request for transforming the Client into a CMSSW module have been made by detector groups for other reasons (accessing parameter set, calibrations, geometry) • This would allow simple integration of detector specific Clients into a single application, as it is done for Producers • It is not necessary to run single Client though, an alternative could have been running all the detector specific clients in parallel • But issue of extracting a unified vision of the run status would remain DQM Workshop, CMS Week
Main Problems Encountered • DQM analyses generally not designed to be high performance on-line applications Reason: were developed for detailed mostly off-line studies at individual beam/cosmic tests some adjustments are needed (e.g. groups are thinking of a “DQM light” version of their full analyses, some have implemented it during MTCC) • Huge DQM information produced (extremely high granularity) not suitable for non-expert shifter, not suitable for human to process, automated quality tests can do it, but we need simple way of presenting results. TrackerMap@ Brainstorm on how to export Tracker graphical representation of DQM results to all detectors started before MTCC, might lead to solution. A pre-MTCC estimate, if anything numbers have grown! DQM Workshop, CMS Week
DQM Data Content • For global DQM runs the granularity of available histograms was very high, very good for detailed analyses, but hard to extract reasonable number of histograms summarizing global conditions • Amount of information and difficulty in processing it is driven by fraction of detector running expect more as we move to entire CMS geometry • One attempt done during MTCC: priority was given during shifts to • Trigger plots to be maintained for the future • Muon reconstruction plots to be redirected to the physics oriented shifts @ ROC’s (eg CERN, FNAL), especially when LHC events are produced. DQM Workshop, CMS Week
Comments on shifts • Shift was combined with EventDisplay shift: very good in terms of manpower issues, on road towards minimizing shift crew already… • Shifter was required to compare plots with reference plots and/or descriptions posted in twiki page. • In addition to Trigger/Muon reco plots, all plots from detectors groups were available to the shifter (plot with same layout as used by detector groups for individual detector shifts) could be useful when in control room one or fewer expert per detector only is present, he/she can use the DQM station to investigate own histograms (or connect with another client) • When DQM content optimized for global & non-expert shift (more global status conveyed in few plots, high granularity tests taken care of by automated tests and displayed graphically,….) global shifts will be more effective • During final days Function Manager would start DQM Producer & Client at beginning of each run useful for remotely operated shifts • During final days VNC installed on DQM PC Could be controlled remotely by experts, usful for remotely operated shifts and useful especially when technical problems encountered DQM Workshop, CMS Week
Additional off-line Information • Off-line DQM: The online and quasi on-line DQM applications ran on fairly limited amount of events, root files were not stored for these applications • An additional off-line analysis with all relevant DQM modules, as in Producer2) was run in automated way on data stored in Castor (express stream, i.e. about 10% of all data) • Root files were stored and accessible through web browser (tool provided by FNAL) • In principle detector groups should be alleviated of lots of work load with this, provided constant interaction to assure code run is what is desired • ELOG section “EventDisplay & DQM” was used with two scopes 1) Shift report, storage of reference histograms produced during “good” runs 2) Logging software commissioning messages • Same section used for both will ask for separate sections in the future • Crucial off-line tool that must be implemented: run information/summary should contain quality flags produced by DQM implies that we can extract such flags in automated way from the DQM analyses DQM Workshop, CMS Week
ROC Activities Overview • Common tools:Development/implementation/testing of common tools which aid better integrated CMS operation as a whole, such as: • WBM [Web Based Monitoring](CMS IN-2006/044), available @ MTCC • WBM server (HW), DQM results viewing & archiving tools, available @ MTCC • SSS [Snap Shot Service], available now • various DQM services in SM [DQM data server functionality in Storage Manager], in progress • Integrated Trigger Monitoring system (see Trigger talk by Jeff) better integrated CMS operation In long term, more effective ROC (our current focus) operation These tools are useful to have anyway, whether we have a ROC operation or not. These activities typically go over many cross boundaries of sub-detector system, sub-trigger system, DB, DAQ, DQM, software, computing, etc..... We are looking forward to work closely with (as part of) the commissioning team. DQM Workshop, CMS Week
ROC Activities Overview (Cont.) time • Active participation from ROC at FNAL • 2004,5,6: HCAL Test Beam (CMS-IN-2006/053) • 2006: MTCC-I, MTCC-II global running (CMS IN-2007/008) • 2007 winter-spring: • SiTracker test, • Trigger commissioning, • playback data transfer & book-keeping test (from P5 -T0/T1- ROC) • 2007 May onwards: global commissioning runs and MTCC- III • End of 2007 onwards: commissioning and beam runs • Commissioning of the new ROC room at FNAL Wilson Hall 1st Floor - LHC@FNAL • 3 components: CMS, LHC & Outreach. • close collaboration with accelerator people is expected (for example, beam halo study, etc....) • Working closely with the planning team of the CMS Centre at CERN now DQM Workshop, CMS Week
WBM (Web Based Monitoring) WBM allows information (from Data Base, etc.) to be easily viewable via web pages in real time. Dynamically create plots... etc.. • These are tools developed by Bill Badgett (FNAL) over the years of CDF running/monitoring. These tools have been found extremely useful (especially by shift crew and the Trigger/DAQ/subdetector experts from any locations to commission and to debug problems ). • Bill, Zongru Wan (Kansas State) and Steve Murray (FNAL at CERN)(Page 1, SM) and Alan Stone (FNAL) are the main people working on the implementation of WBM for CMS. • RunSummary has been heavily used during the MTCC I, II, already. RunSummary for the SiTracker test at TIF is created and being commissioned currently. • WBM also provides viewing of the DCS information interfaced via meta-tables. Work in progress to make WBS for SiTracker DCS information to be a useful tool. • Having good information in DB (and accessible) is very very important. DQM Workshop, CMS Week
Link to DQM results Run Summary (WBM Example) From MTCC-II click to plot magnet current history DQM Workshop, CMS Week
SSS (Snap Shot Service) SSS, a screen sharing application: takes a snap shot of what's on a screen, put it on a web page with auto update (frequency can be adjusted) for an easy viewing from anywhere (not a ”control” sharing). • This tools developed by Kurt Biery (FNAL). • Similar tools exist (VNC, WebEx, etc..) (each has own advantages) • Advantages of SSS are: Free, no installation of tools needed in the viewing end (just view from a web page.) Installation of SSS at the source is very simple. (just need Java and web browser) & secure. • Possible useful usage: viewing of PVSS pages • Being used in CDF for the remote monitoring shifts since several weeks ago. DQM Workshop, CMS Week
FNAL ROC and MTCC I & II Coordinated effort with CERN MTCC Operation/Computing/Software groups. MTCC-Phase 1 Goal and Strategy(DQM was not running consistently at Point 5): • transfer events to FNAL using T0/T1 facility • locally run available DQM programs and event display systematically • make results easily accessible to everyone as fast as possible (use WBM tool) • Take shifts to contribute to the MTCC operation by doing quasi-online monitoring. MTCC-Phase 2 Goal and Strategy(DQM programs are running more systematically at Point 5): • Do real time Data Quality Monitoring by looking at DQM results running at Point 5 and take official DQM shifts. • Run Event Display locally on events transferred in real time. • Continue to do quasi-online monitoring as in Phase-1 with the transferred data. This has the advantage of running on every event, and it is possible to do reprocessing with improved programs with good constants. We have achieved both the phase 1 & 2 goals! However: - issues with book-keeping of files, data transfer originated from P5 - DB access non-trivial (difficulty of running code is often related) DQM Workshop, CMS Week
SiTracker test @ TIF from WH1 ROC The first main customer to use the newly created ROC room at Wilson Hall 1 which became available this month (Feb. 2007). Still the room itself is being commissioned (working on the PC setting, etc., ), as well as working on the contents of the SiTracker test itself. • coordinated effort with the Tracker group & ROC people • At FNAL, shift operation planned with ~15 people from several institutions • period: TIF test (Feb – Mid May) • Monitor: • DQM • Data transfer/book-keeping • Event display • DCS,..... DQM Workshop, CMS Week
Summary – ROC at FNAL • RPC @ FNAL has been working on various tools which are useful to commission and debug/monitor CMS as a whole, keeping in mind that remote operation can done with rather transparent manner. • We are currently actively working on the SiTracker test, and plan to work on trigger commissioning, playback test, up-coming commissioning running, MTCC-III. towards the physics runs in near future. DQM Workshop, CMS Week
Online, quasi-online and offline event displays received data from an http event proxy, a binary raw data file or a POOL source (local files or at T0/T1). The event displays run at P5 and offsite: Cacti Logbook Web VNC TV iCMS Central event display at P5 produces snapshots which are published on the Web and transmitted to iCMS and TV. The load on the event display computer monitored with Cacti (number of processes over a day, week, month period); Remote control of the event display via VNC; Observed problems reported to LogBook. MTCC Event Display • Goals for MTCC: Deploy generic and specific on-line (off-line) event displays; • see “MTCC Event Display” presentation by I.Osborne during CMS Week in March, 2006: • Provide a homogeneous, well integrated solution for monitoring and event display; • see “Visualization Plan for 2005” presentation by I.Osborne during CMS Week in June, 2005.
Central event display load during MTCC II 24/7 shifts: the number of the event displays running simultaneously ranges from 1 to 8, the longest run is more then 48 hours. The period shown from November, 23rd, 2006 to November, 28th, 2006. pccmscr14: a dedicated event display computer (SLC3, 2Gb RAM, nVidia graphics card Quadro FX 13/PCI/SSE2, local user iguana, VNC, + 250 Gb disk startup/login scripts MTCC Event Display: Summary • The event display is installed and configured and is ready to be used for commissioning (providing the synchronization with the CMSSW version used by central DAQ is done); • Fairly straightforward to use: an online event display starts automatically and an offline event display can be started by a run script which generates a suitable parameter set file on the fly: • it has been operated by people on shift without prior training or experience with the event display by following a 3 step instructions; • It has proven to be useful in rapid debugging and solving problems: • more then 100 entries in the Event Display and DQMand the sub-detector subsystems of the Logbook; • more then 30 entries in the General Shift Logbook.
Summary and Outlook • Path towards centralized and integrated integrations started during MTCC: • Global DQM analysis running at all time in on-line and offline modes (phase 2) • Centralized shifts established (and integrated with EventDisplay) • Operations were successfully integrated with ROC at FNAL • Exercised permanent storage of DQM data (but needs to be redirected to other storage systems) • Main problems encountered: • hard to extract reasonable number of histograms summarizing global conditions • Current high granularity histograms can be checked in automated way, but a simple graphical tool is needed for results visualization DQM Workshop, CMS Week