110 likes | 271 Views
Monitoring and Control the xrootd servers in ALICE May 2012 . Iosif Legrand Caltech - CERN. Proposed Architecture to Monitoring & Control the Xrootd in ALICE. Run a MonALISA service on each host running an XROOTD server ( a new dedicated group ) Control the xrootd server
E N D
Monitoring and Control the xrootd servers in ALICE May 2012 Iosif Legrand Caltech - CERN
Proposed Architecture to Monitoring & Control the Xrootd in ALICE Iosif Legrand May 2012 • Run a MonALISA service on each host running an XROOTD server ( a new dedicated group ) • Control the xrootd server • Start / Stop / Update • Dynamically Change conf parameters • Collect Monitoring information from the xrootd servers using the new & improved monitoring functionality from xrootd ( version 3.1 ) • Perform full network measurements and tests (RTT , available bandwidth, topology ) • Monitor directly the storage system used by xrootdservers (capacity , IO performance , resources utilization) • Create a dedicated MonALISA repository
XROOTD SERVER XROOTD SERVER XROOTD SERVER ApMon ApMon ApMon MonALISA MonALISA MonALISA Proposed Architecture to Monitoring & Control the Xrootd in ALICE Control Module Control Module Network Monitoring System Monitoring Network Monitoring System Monitoring Control / Update Aggregated Data Control Module MonaLisa Xrootd Repository Alerts Network Monitoring Actions System Monitoring Long History DB Iosif Legrand May 2012
Two levels of decisions: local (autonomous), global (correlations). Actions triggered by: values above/below given thresholds, absence/presence of values, correlations between any values. Action types: alerts (emails/instant msg/atom feeds), automatic charts annotations in the repository, running custom code, like securely ordering MLs service to change connectivity – optimize traffic, submit jobs, (re)start global service. Local and Global Decision Frameworkin MonALISA can be used to control the xrootd • Traffic • Jobs • Data servers • Apps ML Service Actions based on global information Global ML Services Actions based on local information • Temperature • Humidity • A/C Power • … ML Service Sensors Local decisions Global decisions Iosif Legrand May 2012
Advantages of Using a dedicated MonALISA group to monitor and control all the XROOTD servers • Easy to maintain and update a critical service for Offline computing • Significantly improve the monitoring information and will help to better understand the way storage system are used in the ALICE Computing model (Distributed Storage for Data) • Control the xrootd servers and can dynamically configure systems based on how they are used. • Monitor the true network connectivity • Monitor the real storage used by xrootd servers • Monitor the connection and transfer per client / job (it seems that this needs modifications in xrootd server ) Iosif Legrand May 2012
Current Status • MonALISA services were installed at CERN - SE on 5 xroot servers for testing . The monitoring is configured to provide full system monitoring , applications, storage and networking ( more than 350 parameters per node) • Costin is working to setup all the necessary changes to upgrade the xrootd servers to the new software (3.1) on the CERN-SE . • Reporting monitoring data from the new xrootd server (3.1) was well tested by Harsh on a small, dedicated setup here at CERN. • We will need to verify all the functionality on the production environment at CERN and verify the monitoring data collected. • The CERN- SE will be used to check and validate the monitoring information from the new xrootd software using full system monitoring. • A new repository prototype is under development. It will include improved multi – parameter correlation plots . Iosif Legrand May 2012
Load – on the CERN –SE ( 5 servers ) using the new MonALISA services for xrootd servers Iosif Legrand May 2012
IO Traffic IN on the SE- CERN servers Iosif Legrand May 2012
RTT from voalicefs1 to the other servers This huge RTT is not normal ! Iosif Legrand May 2012
Load – Traffic distribution for all Xrood servers (for the last month with fine time granularity) “Bad” servers Total of 358 Xrootd servers Max ~ 3000 “Overloaded” servers Development of complex & flexible correlation plots to help understand the data access patterns on the full ALICE grid Load “Good” servers Iosif Legrand May 2012
Next Steps As soon as the validation on the CERN-SE monitoring for xrootd servers is done, we will make a procedure to easily install the new service at all the ALICE sites. It will be based on a standard MonALISA distribution (easier to maintain) and it will be customized dynamically for each site. Test the MonALISA module that control the xrootd server. Investigate how we can monitor the activity for each individual client from the xrootd servers. This functionality should be switched on / off on demand . Configure the WAN monitoring between servers and organize the metrics to nicely show the connectivity among all the servers. Iosif Legrand May 2012