130 likes | 277 Views
Xrd Monitoring. Jacek Becla Stanford Linear Accelerator Center (SLAC). XrdMon. Allows monitoring I/O in real-time Low overhead, non-intrusive Reconfigurable granularity, flush intervals, output location among others. Typical Architecture.
E N D
Xrd Monitoring Jacek Becla Stanford Linear Accelerator Center (SLAC)
XrdMon • Allows monitoring I/O in real-time • Low overhead, non-intrusive • Reconfigurable • granularity, flush intervals, output location among others Jacek Becla
Typical Architecture Jacek Becla
Red indicates “bulk” data. Everything else available in “light” (default) mode Monitored Data • Client information • user name, client host name, process id, session duration, disconnect time • File information • corresponding client, full path, open time, close time, #bytes read, # bytes written, { offset accessed, length, read/write mode, timestamp } • Application information • depends on client. Can pass anything, e.g. job type, cache hit • Xrd forwards to the collector Jacek Becla
Overheads • xrootd • unnoticable • assuming reasonable configuration • Collector / decoder • single collector + real time decoder can easily keep up with typical load at SLAC • <5% of 1 750MHz CPU • decoding bulk data: not in real time, 1CPU enough to keep up • Space • light: few GB/year / all BaBar activities at SLAC • bulk: few TB/year Jacek Becla
XrdMon Configuration @SLAC • Light mode • continueous, production (24x7x365) • starting in few weeks • use to monitor system, gather statistics, look for abnormal activities, understand server load (total and/or per application type) • Bulk tracing • will turn on occasionally for chosen applications • use to understand access patterns Jacek Becla
Demo Data based on test setup. xrootd production version doesn’t contain many xrdmon metrics yet Servers configured specifically for demo: 4 sec flush frequency, 3 sec time window. In practice in production expect longer (~min) delays Jacek Becla
First Analysis of Bulk Traces Jacek Becla
First Analysis of Bulk Traces • Easy to simulate effect of prefetching based on bulk data • played with different page sizes, # pages, page position relative to requsted offset • Example (optimal config) • for SP files • prefetch 32K pages, cache 75 pages • don’t reread already fetched section of a page • result: 52.08% cache hit, 43.40% used/read bytes • for SP deep-copied skims • prefetch 128K pages, cache 75 pages • don’t reread already fetched section of a page • result: 95% cache hit, 95% used/read bytes Jacek Becla
Current Status of XrdMon • Server side - all done • Collector + real time decoder • ready to put in production @slac, should happen v. soon • Offline bulk decoder available • need work to decode recently added metrics • Have scripts to setup/load MySQL • alpha version • To do includes: • fully automating data flow, back up • web interface to MySQL data • generating & sending application info • controlling monitoring (off/on, bulk/light) from application • docs Jacek Becla
XrdMon Availability • Available as part of xrootd distribution already • XrdMon • Not built by default (yet) • Will be announced once we run it for few weeks in production • and write documentation • Contact me (becla@slac.stanford.edu) if you want to try it out today Jacek Becla