160 likes | 172 Views
This project focuses on creating detailed installation and configuration guides, reinstallation procedures, and monitoring plug-ins documentation. Pilot sites validate the documentation, with successful XRootD monitoring at IHEP and JINR DNLP. Sharing on Twiki at CERN T3mon-site dataflow. Technologies used include PostgreSQL, Python, Ganglia, and Web UI for visualization.
E N D
T3mon status ArtemPetrosyan, DanilaOleynik DNG section meeting, 05.07.11
Testbed at JINR • Multicore nodes, virtualization • 5 clusters • XRootD • PROOF • PBS • OGE/SGE • Condor • Lustre • Ganglia • XRootD, PBS, OGE/SGE, Lustre • Nagios • XRootD • JobMonarch • PBS • OGE/SGE ATLAS TIM
Load simulation • XRootD • User login/log out, file access • PBS • Test jobs • OGE/SGE • Test jobs • PROOF • Test physics analysis jobs ATLAS TIM
Validation • Installation, create documentation which assembles references to installation and configuration instructions for particular batch and storage solutions and documentation for monitoring plug-ins • Reinstallation basing on the prepared documentation • Validation of documentation by pilot sites • Successful XRootD monitoring implementation at IHEP and JINR DNLP • Documentation sharing • Twiki at CERN ATLAS TIM
T3mon-site dataflow ATLAS TIM 31.05.11 5
Technologies ATLAS TIM • PostgreSQL • JobMonarch backend • Have to use MySQL as a temporary backend • Python • ATLAS DDM, Dashboard development language • Ganglia • RRD for storage • Web UI for visualization 31.05.11 6
XRootD dataflow ATLAS TIM 31.05.11 7
XRootD add-on • Summary database structure is ready • Multithread application • Read detailed monitoring stream • Publish into Ganglia and database backend • Metrics • User login/logout • File access • File transfer • Status • Reader is ready • Writer is in development ATLAS TIM
PROOF dataflow ATLAS TIM 31.05.11 9
PROOF add-on • Summary database structure is ready • Normalized structure • Triggers are used for data normalization • Metrics • User • Start-end time • CPU • Wall time • Dataset name • Number of files in the dataset • Number of events • Number of workers • Status • Data is being collected in the database • Publisher into Ganglia is in development • Open issues • PostgreSQL connectivity ATLAS TIM
T3mon-site summary - done + - in progress ATLAS TIM
T3mon-global • Dashboard – common collector and presenter of Tier3 sites global monitoring • Metrics compatibility • Common technologies • Metrics for global monitoring defined and collected on local sites • Job processing • Data transfers • Data access • Messaging System for the Grid (MSG) based on ActiveMQ is used as a message bus ATLAS TIM
T3mon data flow ATLAS TIM
Job processing metrics with PROOF ATLAS TIM
Data transfers metrics • Defined in Dashboard for FTS: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransferMonitoring • Should cover list of metrics given by xRootd ATLAS TIM
Todo list • Finalize Tier3 site development • xRootd, PROOF solutions • Ensure robust handling of the monitoring agents • Documentation and testing • Develop producers for transferring data to the global level ATLAS TIM