70 likes | 197 Views
Update on federated XrootD monitoring. R. Gardner ADC Weekly Operations July 10, 2012. Federated monitoring discussion. Last week we convened a meeting with several folks involved in federated XrootD monitoring
E N D
Update on federated XrootDmonitoring R. Gardner ADC Weekly Operations July 10, 2012
Federated monitoring discussion • Last week we convened a meeting with several folks involved in federated XrootDmonitoring • CERN IT-ES dashboard (Julia Andreeva, ArtemPetrosyan, DanilaOleynik, Sergey Belov) • CERN IT-ES popularity service (Domenico Giordano) • US CMS AAA project, Gled (MatevzTadel) • The purpose of meeting was to get updated information from these groups so as to chart our course forwardfor production quality FAX infrastructure • URL: https://indico.cern.ch/conferenceDisplay.py?confId=198771
Monitoring components • Basic picture (adapted from Julia Andreeva) • prototype • soon Viz (existing) Panda callbacks Julia Andreeva
File transfer information • Collection uses "UCSD collector” • Based onGled(http://www.gled.org/) • Already in use during FAX development phase • CollectsUDP packets from data servers • Now has a publisher into ActiveMQ so that information from sites can be aggregated for a number of purposes: • Panda, federated dashboard, popularity • Currently deployed at SLAC • Matevz and Julia’s team to deploy a second at CERN • Both can publish into the ActiveMQ message bus
Publishing to ActiveMQ • Discussed update frequency: • For completed file transfers at file closing • During the transfer (for direct access reads) at fixed intervals, “real-time” • Some jobs may keep files open for long periods • Brokering decisions may require real-time information • File-level summary information is collected by the UCSD collector, and now published into an ActiveMQ topic • A second topicwill be setup for the “real-time” information • An open question is how to distinguish messages for completed file transfers from direct access reads in progress, and LAN traffic • dCache billing database only has completed transfer information (tbdw/ dCache team)
Other topics discussed • Federation functionality / availability status monitoring • Discussion started with Jarka and Stephane to replace existing FAX status monitor http://uct3-xrdp.uchicago.edu:8080/rsv/ with a production service • Integration with SSB to monitor site redirectors and storage visibility – will be Nagios-based • Integration with SLS to monitor regional and global redirectors • Separating local IO from federated IO • Communicating endpoints – servers (site names from AGIS), clients (some will be ROAMING) • Global WLCG transfer monitoring (transfers) and ATLAS specific federation monitoring (more detailed, real-time) • Privacy, access issues for publishing & accessing detailed information
Next meetings • Follow-up meeting focused on monitoring • Thursday, July 26 17:00 CEST • Next atlas-adc-federated-xrootd meeting is: • Monday, July 16 17:00 CEST • Email: atlas-adc-federated-xrootd@cern.ch