200 likes | 313 Views
ATLAS Off-Grid sites (Tier-3) monitoring. A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna. Goals of the project. Provide reasonable monitoring solution for ‘off grid’ sites (unplugged geographically close computing resources)
E N D
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna
Goals of the project • Provide reasonable monitoring solution for ‘off grid’ sites (unplugged geographically close computing resources) • Monitoring of computing facility of local groups with collocated storage system (Tier1+Tier3, Tier2+Tier3) • Present Tier-3 sites activity on global level • Data transfer monitoring across XRootD federation GRID'2012, JINR, Dubna
Tier-3 sites monitoring levels • Monitoring of the local infrastructure for site administration • Central system for monitoring of the VO activities at Tier-3 sites GRID'2012, JINR, Dubna
Objectives of the local monitoring system at Tier-3 site • Detailed monitoring of the local fabric • Monitoring of the batch system • Monitoring of the job processing • Monitoring of the mass storage system • Monitoring of the VO computing activities on the local site GRID'2012, JINR, Dubna
Objectives of the global Tier-3 monitoring • Monitoring of the VO usage of the Tier-3 resources in terms of data transfer, data access, and job processing • Quality of the provided service based on the job processing and data transfer monitoring metrics GRID'2012, JINR, Dubna
Site monitoring • Based on Ganglia monitoring system • Collects basic metrics using Ganglia sensors • Plugin system for monitoring specific metrics • PostgreSQL to aggregate data • More details for each package at https://svnweb.cern.ch/trac/t3mon/wiki/T3MONHome • Monitoring modules available for Condor, Lustre, PBS, Proof, XRootD; each has plugin to deliver data to the global level • Examples of UI for different systems at http://vm01.jinr.ru/ganglia/ GRID'2012, JINR, Dubna
Data flow for the site monitoring • Common UI for various data sources • Small core with separate modules allows to install only needed software • Delivery to global level can be switched off GRID'2012, JINR, Dubna
Global monitoring • Ganglia as executor • MSG as transmitting system • Publisher on local site: is executed by gmond, intercommunicates with local DB and sends information to MSG system • Backend: consumer(s) of messages at CERN and data popularity and jobs statistics presentation via Dashboard GRID'2012, JINR, Dubna
Data flow for the global monitoring GRID'2012, JINR, Dubna
Data flow for Proof, Condor • PostgreSQL for data aggregation on local site • Ganglia UI to present data popularity on site level • Ganglia gmond to execute summary gathering • Summary is delivered to Dashboard historical views once per hour • Data being sent to global level: • Job status: Ok, stopped, aborted • Site name • Time of report • Amount of processed events • Bytes read • Amount of active users GRID'2012, JINR, Dubna
Data flow for XRootD • Both summary and detailed events gatherer implemented as Linux daemon • Summary data goes directly to Ganglia • File transfer data can be stored in local PostgreSQL and then presented via Ganglia • Detailed data can be delivered to ActiveMQ directly • Data being sent to global level: • Domain from, host and ip address • Domain to, host and ip address • User • File, size • Bytes read, written • Time transfer started and finished GRID'2012, JINR, Dubna
Tier-3 monitoring status • Full chain of development from Tier-3 site to Dashboard was performed • Site-level presentation via Ganglia Web 2.0 • Global-level presentation of Proof jobs via Dashboard Historical Views • Tier-3 site to DQ2 popularity: formats agreed, delivers, consumer on DQ2 side is in testing stage • T3Mon software was installed on pilot sites • Distribution is available via our repository: https://svnweb.cern.ch/trac/t3mon/wiki/YumConfigure • We are welcome more sites to try and to send their feedback to our support list: t3mon-jinr-@googlegroups.com GRID'2012, JINR, Dubna
XRootD transfers monitoring • Goal: present transfers between servers and sites in federation via one UI • Messages from XRootD servers are being collected via T3Mon UDP collector and then being sent into AMQ • Data is stored in Hbase storage • Hadoop processing is used to prepare data summaries • Web-services for data export • Dashboard transfer interface as UI GRID'2012, JINR, Dubna
Data flow for the XRootD federation monitoring GRID'2012, JINR, Dubna
T3Mon UDP messages collector • Can be installed anywhere, implemented as Linux daemon • Extracts transfer info from several messages and compose file transfer message • Sends complete transfer message to ActiveMQ • Message includes: • Domain from, host and ip address • Domain to, host and address • User • File, size • Bytes read/written • Time transfer started/finished GRID'2012, JINR, Dubna
AMQ2Hadoop collector • Can be installed anywhere, implemented as Linux daemon • Listens ActiveMQ queue • Extracts messages • Inserts into Hbase raw table GRID'2012, JINR, Dubna
Hadoop processing • Reads raw table • Prepares data summary: 10 min stats as structure: • From • To • Sum bytes read • Sum bytes written • Amount files read • Amount files written • Inserts summary data into summary table • MapReduce: we use Java, we also working on enabling Pig routines GRID'2012, JINR, Dubna
Storage2UI data export • Web-service • Extracts data from the storage • Feeds Dashboard XBrowse UI GRID'2012, JINR, Dubna
Status • In prototype stage: • Hadoop processing is executed manually • Simulated data • UI: http://xrdfedmon-dev.jinr.ru/ui/#date.from=201206210000&date.interval=0&date.to=201206220000&grouping.dst=(host)&grouping.src=(host) • We are ready to start testing on real federation GRID'2012, JINR, Dubna
Thanks for attention GRID'2012, JINR, Dubna