150 likes | 267 Views
Using xrootd monitoring flows in various contexts. Artem Petrosyan , Sergey Belov , Danila Oleynik , Sergey M itsyn , Julia Andreeva. Usecases for xrootd monitoring. Integration of the xrootd transfers in the WLCG transfer Dashboard, which currently tracks only FTS transfers:
E N D
Using xrootd monitoring flows in various contexts ArtemPetrosyan, Sergey Belov, DanilaOleynik, Sergey Mitsyn, Julia Andreeva
Usecases for xrootd monitoring • Integration of the xrootd transfers in the WLCG transfer Dashboard, which currently tracks only FTS transfers: http://dashb-wlcg-transfers.cern.ch/ui/ • Data popularity • Xrootd federation monitoring
Data flow for the WLCG data transfer Dashboard and Popularity application • For the first two cases monitoring reports for accomplished transfers generated from the detailed xrootd data flow is enough. • Per-file monitoring reports are generated by the UCSD collector which consumes UDPs from the xrootd servers (detailed data flow) • A separate component consumes generated per-file reports (UDPs) and publishes them to ActiveMQ • This info is then will be consumed by the Dashboard and Popularity collectors
Data flow for the WLCG data transfer Dashboard and Popularity application Xrootd server Dashboard DB UDPs UDPs Xrootd server ActiveMQ publisher UCSD collector ActiveMQ MB at CERN Xrootd server Popularity DB Xrootd server
Data flow for the WLCG data transfer Dashboard and Popularity application.Possible improvement Xrootd server Dashboard DB UDPs Xrootd server UCSD collector ActiveMQ MB at CERN Xrootd server Popularity DB Xrootd server
Federation monitoring • This work is as a spin off of the ATLAS T3Mon project . It is being performed by JINR (Dubna) team whose monitoring task had been agreed with the xrootd consortium. The architecture described in the further slides had been shown at the xroootd meeting in Lyon, several ATLAS meetings and CHEP WLCG workshop • Current implementation uses only monitoring information for accomplished transfers retrieved from the detailed flow. • The UI is similar to the UI of the ATLAS DDM Dashboard and of the WLCG Transfer Dashboard and therefore is familiar to the ATLAS community • First prototype enabled on the simulated data flow. Take a look: • New URL:http://xrdfedmon-dev.jinr.ru/ui//#date.from=201206210000&date.interval=0&date.to=201206220000&grouping.dst=%28host%29&grouping.src=%28site%29&m.content=%28efficiency,successes,throughput%29 • Old URL: http://fizmat-work.dyndns.org/ui/#date.from=201205140000&date.interval=0&date.to=201205150000&tab=src_plots
Architectural principles • Asynchronous communication between information sources and components of the system at various hierarchical levels (possibility of hierarchical aggregation, collecting-processing-publishing-collecting-processing-publishing…) through ActiveMQ • Common way of handling summary and detailed flows of the native xrootd monitoring (in difference with the current CMS approach) • Flexibility in terms of deployment scenarios which can depend on the size of federation and requirements regarding number of metrics and aggregation granularity. This would allow to construct a scalable system but in difference with ALICE approach does not necessary requires per-site deployment of any components • Choosing technologies which can be deployed at the federation level (no Oracle for example) but should scale in case of heavy load (hadoop, hbase for persistency, MapReduce for data processing) • Common monitoring UI at all levels (federation, VO, WLCG global) but with different levels of details and with additional views at the federation level • Common naming convention (with possibility of choice VO-specific or GOCDB/OIM) which is ideally should be provided by the experiment topology system
Data flow for the xrootd federation monitoring UDP collector Currently there are 2 implementations 1).UCSD collector+ ActiveMQ publisher 2)T3Mon collector_publisher Both provide needed functionality Xrootd server UDPs Xrootd server ActiveMQ message bus AMQ2Hadoop collector Hbase MapReduce processing Xrootd server JSON ATLAS DDM Dashboard-like UI Xrootd server
Data flow for the xrootd federation monitoring Possibility to republish aggregated metrics for further consumption at the higher level UDP collector Currently there are 2 implementations 1).UCSD collector+ ActiveMQ publisher 2)T3Mon collector_publisher Both provide needed functionality Xrootd server UDPs Xrootd server ActiveMQ message bus AMQ2Hadoop collector Hbase MapReduce processing Xrootd server JSON ATLAS DDM Dashboard-like FederationUI Xrootd server
T3Mon UDP messages collector • Can be installed anywhere, implemented as Linux daemon • Listens UDP port • Extracts transfer info from several messages and compose file transfer message • Sends complete transfer message to ActiveMQ • Message data: • Domain from, host and ip address • Domain to, host and ip address • User • File, size • Bytes read/written • Date transfer started/finished Functionality is similar to the UCSD collector, any of two implementations can be used
AMQ2Hadoop collector • Can be installed anywhere, implemented as Linux daemon • Listens ActiveMQ queue • Extracts messages • Inserts into raw table in Hbase
Hadoop processing • In prototype stage, routines executed manually • Reads raw table • Prepares summary: 10min stats from:to:read:written • Inserts summary data into summary table in Hbase • MapReduce: we use Java, preparing Pig routines
Hbase data export • Web-service • Extracts data from the storage • Feeds Dashboard XBrowse UI. • Xbrowse is a java script framework used both for ATLAS DDM Dashboard and WLCG Transfer Dashboard, backend agnostic, requires JSON input. Easy to integrate with various underlying data sources
Why did we ask for raw data reporting to CERN server? • Current implementation of the Federation monitor uses data about the accomplished transfers. We assume it is not enough, since we won’t be able to provide ~ to real time picture with accurate IO measurements. • Certainly, architecture does not foresee sending raw UDPs to a central server. UDP collectors should be deployed as close as possible to the xrootd servers of the federation. Deployment model of how many of them is required should be understood considering amount of monitoring data and size of federation Initially we foresaw per-server or site deployment (ALICE-like model) but this might required additional deployment/maintenance effort. Some reasonable compromise should be found. • Currently, raw data is required for - crosschecks of the correctness of the aggregated data flow Our intention was to generate per client-server snapshots from the raw data and to compare them with similar snapshots based on data consumed from ActiveMQ. -implementing collector which would allow to provide close to real-time picture not based just on accomplished transfers (process raw UDPs and republish results of processing to message bus, or even just raw UDPs to message bus?) -estimation of the amount of transferred data and suggesting recommendations for various deployment scenarios in the future
Issues and open questions • How to resolve federation topology, in particular for clients (mapping of the IP domain to a particular site either in ATLAS or GOCDB/OIM naming convention, though might be not always possible )? Is it foreseen to provide such mapping through AGIS? • For the moment we did not manage to install UCSD collector using existing documentation (sent questions to Matevz) • Any new monitoring system requires validation ( functionality, scalability, performance, reliability of information). Finding ATLAS pilot sites for validation of any T3Mon components on real data flow is almost impossible ( relevant not only for xrootd but for example for proof monitoring as well). • Does ATLAS need more accurate and ~ realtime monitoring than what can be provided based only on the reports about the accomplished transfers)? • Does ATLAS need a single monitoring display for federation monitoring similar to ATLAS DDM Dashboard with all necessary data available through a single entry point (no separate displays and completely different overall implementations for smry and detailed xrootd data flows )?