310 likes | 446 Views
WLCG Transfers Dashboard. WLCG Workshop in conjunction with CHEP 2012, 20.05.2012, New York Julia Andreeva , David Tuckett , Daniel Dieguez , Danila Oleynik , Artem Petrosyan , Gunnar Roe, Michail Salichos , Alexandr Uzhinskiy. Contents. Motivation
E N D
WLCG Transfers Dashboard WLCG Workshop in conjunction with CHEP 2012, 20.05.2012, New York Julia Andreeva, David Tuckett, Daniel Dieguez, DanilaOleynik, ArtemPetrosyan, Gunnar Roe, MichailSalichos, AlexandrUzhinskiy
Contents • Motivation • Overview of the key concepts of the WLCG transfer monitoring system • Current status and issues • Dashboard UI • Integration of xRootD monitoring • Summary Julia Andreeva, WLCG Workshop 2
Motivation • Currently there is no tool which can provide an overall view of data transfer on the WLCG scope (across LHC experiments, across various technologies used, for example FTS and xRootD, across multiple local FTS instances, etc..) • Every LHC experiment follows it’s own data transfers through aVO-specific monitoring system. • There is a clear similarity between the tasks performed by all VO-specific transfer monitoring systems. Operations like aggregation of the FTS transfer statistics is done by every VO separately, though it could be done once , centrally and then can be served to all experiments via well defined set of APIs • In order to organize data transfer in the most efficient way experiments need more information than is currently available. For example correlations of data transfer between experiments, latencies related to SRM operations during data transfers, etc... Julia Andreeva, WLCG Workshop 3
Concept (1) • WLCG transfer monitoring is a common solution which provides cross-VO, cross-technology view not coupled with any VO-specific data management system • VO transfer monitoring integration • Transfer events via MSG broker • Avoids polling and screen-scraping local FTS instances • Transfer statistics via Dashboard API • Avoids redundant event storage and statistics generation • Transfer plots via Dashboard UI • Avoids redundant development of common plots Currently main technology for CMS, ATLAS and LHCb Dashboard FTS instance MSG Broker API UI Xrootd etc VO Monitoring Currently main technology for ALICE Julia Andreeva, WLCG Workshop 4
Concept (2) • WLCG transfer monitoring is a common solution which provides cross-VO, cross-technology view not coupled with any VO-specific data management system Dashboard FTS instance MSG Broker API UI Xrootd etc VO Monitoring Implementation started with FTS monitoring Julia Andreeva, WLCG Workshop 5
Current status (1) • Required deployment of FTS 2.2.8 which was enabled for transfer status reporting via MSG (GT group) • The prototype is up and running for more than half a year • Example of excellent collaboration between several groups in CERN IT (ES, GT, PES, DB) , between IT and PH ( active participation of CMS and ATLAS computing teams), between CERN and JINR (Dubna) Julia Andreeva, WLCG Workshop 6
Current status (2) • WLCG Transfer Dashboard was developed using a similar schema and UI as ATLAS DDM Dashboard. This allowed a prototype to be put in place in a short time ~ 2 months. • Full production setup is in place: -The schema was validated by ORACLE experts from CERN IT-DB and was deployed in production -Production collectors and UIs are running in a redundant mode (2 hosts), - 2 production message brokers are setup (many thanks to Lionel Cons (CERN IT-GT) and CERN IT-PES group) • Testing and integration environment created ( integration DB, test message broker, VMs for collectors and UIs) • Alarms are enabled in case any of the production FTS instances does not report for longer than 2 hours • First cycle of validation was performed by the CMS colleagues (special thanks to JozepFlix) and all reported bugs were fixed • No problems with UIs or collectors were detected over last months • Delayed announcing system to be in production due to results of the consistency checks Julia Andreeva, WLCG Workshop 7
Current status (3) • The most important step of validation is consistency checks performed in order to understand data trustworthiness. Data is compared between WLCG Transfer Dashboard and Phedex and ATLAS DDM Dashboard. • First results of consistency checks with pilot FTS server were very promising. However, after deployment of FTS 2.2.8 to all T1s, consistency checks showed a big discrepancy, in particular for ATLAS up to 50% • Problem was understood, thanks to MichailSalichos (CERN-IT-GT). It is caused by abug in activeMQ-cpp client used by FTS publisher. • Workaround was found (MichailSalinchos) . A fixed version of the FTS publisher was deployed to the Triumf and ASGC FTSs 3 weeks ago. Permanent consistency checks show perfect agreement. • Tentative schedule for service to be in production 2-3 weeks from now. Depends on patching of all FTS services for activeMQ-cpp client bug. Julia Andreeva, WLCG Workshop 8
Consistency checks WLCG Transfers Dashboard plot for Triumf ATLAS DDM plot for Triumf Julia Andreeva, WLCG Workshop 9
Dashboard UI: overview Key Features • Flexible filtering and grouping • Statistics matrix & error samples • Customizable plots • Web API: JSON, XML • Implementation • Uses common xbrowse UI framework originally developed for ATLAS DDM Dashboard 2.0 Julia Andreeva, WLCG Workshop 10
Dashboard UI: filtering & grouping • Filtering by sliding/fixed interval • Filtering by VO • Filtering by FTS server • Filtering and groupingof sources / destinationsby country, site, host, token • GOCDB naming for cross-VO view • VO-specific naming for single-VO view Julia Andreeva, WLCG Workshop 11
Dashboard UI: matrix & error samples • Matrix • Source • Destination X • Efficiency • Throughput • Successes • Failures • Error samples JuliaAndreeva, WLCG Workshop 12
Dashboard UI: plots • Plots • Source • Destination • VO X • Efficiency • Throughput • Successes • Failures Different kinds of plots are available Possibility to customize plots (time bins, # of shown Items, etc…) See backup slides Julia Andreeva, WLCG Workshop 13
Dashboard UI: consistency • Throughput side-by-side: PhEDEx v. Dashboard • Throughput difference: relative & absolute • In development • Automated cross-checking with alarms 12 hours CERN KIT 24 hours CERN RAL Julia Andreeva, WLCG Workshop 14
Next steps (FTS monitoring) • System developers work in close contact with the VOs. Thanks a lot for active participation of CMS and ATLAS. Got a lot of feature requests, which will be addressed by the future development: • Filter by FTS channel • FTS channel status: current and evolution • Status of the FTS queues. Correlations between transfer performance metrics and status of the queue • Transfer part statistics: SRM overhead, GRIDFTP - Ranking plots and quality map plots Julia Andreeva, WLCG Workshop 15
Integration of XRootD transfers • Dashboard • FTS instance • MSG Broker • API • UI • Xrootd • VO Monitoring XRootDfederation monitoring part is under development Is being developed mainly by JINR (Dubna) Julia Andreeva, WLCG Workshop 16
XRootD monitoring • Is beingimplemented with 3 levels of hierarchy -local site -federation -global Julia Andreeva, WLCG Workshop 17
xRootD monitoring architecture Users- VO computing teams, Federation support teams Users- site administrators and VO support teams at the site Users- VO computing teams, site administrators, VO management, WLCG management` Julia Andreeva, WLCG Workshop 18
XRootD monitoring architecture Julia Andreeva, WLCG Workshop 19
xRootD monitoring (local site) • There are two implementations: -based on MonAlisa (used by ALICE and with some extensions by CMS) -developed in the framework of Tier3 monitoring project for ATLAS (Ganglia) • Both approaches use XRootD monitoring data (smry and detailed flow) reported by XRootD redirectors with UDP. Not event-like content • CMS and ATLAS developed readers reformatting these flows into event-like data which contains: event time, source and destination domains, path and filename, username, file size, #bytes read/written • There is no knowledge about federation topology at the site level • Event-like data complemented with the name of the site which hosts the publisher is published to MSG • MonAlisa or Ganglia UI Julia Andreeva, WLCG Workshop 20
XRootD monitoring architecture Julia Andreeva, WLCG Workshop 21
XRootD monitoring (Federation) • At the federation level data published by the sites will be consumed from MSG. • Events coming from different sites will be aggregatedand complemented with topology information. Currently data processing on the federation level is planned to be implemented with map reduce (Under development) • Transfers handled by federation will be exposed through the federation UI. • Implementation of the Federation UI is similar to the UI of the Global Transfer Dashboard. Adapting global WLCG Transfer UI is straightforward since it is JavaScript client application which expects data in JSON format, fully decoupled from the data source. • First prototype should be ready by the end of June. • Federation data in the format similar to FTS transfer status messages will be published to MSG for global monitoring system Julia Andreeva, WLCG Workshop 22
XRootD monitoring • On the Global level implementation done for FTS should be to a big extent re-used for XRootD (collectors and UI) • Plan to have full chain enabled by the end of the year Julia Andreeva, WLCG Workshop 23
Summary • The FTS monitoring part is ready and will be announced to be in production as soon as all FTS instances are patched foractiveMQ-cpp client bug. Further development follows the requirements of the experiments • The XRootD monitoring part is in the active development phase. Progressing well. Hopefully the first prototype will be ready by the end of June. Full functionality should be enabled by the end of the year • Having FTS and XRootD monitoring covered by a global monitoring system would allow to provide pretty complete picture of the WLCG transfers. • Example of excellent collaboration of several groups in CERN IT, IT and PH, CERN and JINR Julia Andreeva, WLCG Workshop 24
Links • Dashboard UI (prototype)http://dashb-wlcg-transfers.cern.ch/ui/ • Twikihttps://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransferMonitoringhttps://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransfersDashboard • Feedbackwlcg-transfer-monitor@cern.ch • Please see a poster during CHEP poster session Julia Andreeva, WLCG Workshop 25
Backup slides.Dashboard UI: plot types Julia Andreeva, WLCG Workshop
Dashboard UI: plot types Julia Andreeva, WLCG Workshop
Dashboard UI: plot types Julia Andreeva, WLCG Workshop
Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop
Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop
Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop