170 likes | 542 Views
What is Hawkeye?. A monitoring and management tool for distributed systemsThat's great, but...What does that mean?What can Hawkeye do for me?. What is does that mean?. Hawkeye is a tool that can be used to monitor various aspects of your computersExamples:System load monitoringWatching for run
E N D
1. Hawkeye
2. What is Hawkeye? A monitoring and management tool for distributed systems
That's great, but...
What does that mean?
What can Hawkeye do for me?
3. What is does that mean? Hawkeye is a tool that can be used to monitor various aspects of your computers
Examples:
System load monitoring
Watching for run-away processes
Monitoring the health of your Condor pool
4. What can Hawkeye do? Hawkeye can alert you when things go wrong. For example, Hawkeye can:
Alert you when virtually any condition is found
Alert you when various Condor problems are identified
Allow you to specify your own custom alerts
5. Why Hawkeye? Make system administration easier
Make Condor pool maintenance easier
6. Hawkeye Architecture
7. Hawkeye Matchmaking Hawkeye alerts are done using ClassAd matchmaking.
8. Hawkeye ClassAds Hawkeye uses ClassAds to represent collected data
Schema-free data representation
Provides matching mechanism
Represent whatever data you gather in a way that works best for you
9. Hawkeye ClassAds Example ClassAd “snippet”:
RAM_MemFree = 841932800
RAM_MemShared = 0
RAM_MemTotal = 1055367168
RAM_SwapCached = 0
RAM_SwapFree = 2147483647
RAM_SwapTotal = 2147483647
10. Hawkeye ClassAds Example ClassAd “snippet” #2:
Condor_NumExecs = 2
Condor_NumMasters = 1
Condor_NumRunaway = 2
Condor_NumSchedds = 0
Condor_NumShadows = 0
Condor_NumStartds = 1
Condor_NumStarters = 2
Condor_RunawayPids = "3214,8753”
11. Sample Alert Trigger [
AlertTrigger = ( MyType == "Pool" && Absent.count > 5 );
AlertSeverity = ( Absent.count > 5 ) ? 1 : 0;
Name = "Absent Nodes";
AlertText = StrCat(Absent.count,
" machines are missing in ",
Name)
]
12. Hawkeye at UW Currently at UW, we're using Hawkeye:
To monitor our Condor cluster
To aid in detecting and correcting cluster problems
To monitor the US/CMS testbed health
13.
16. Customizing Hawkeye Hawkeye allows you to run your own custom “modules” to gather data.
Hawkeye allows you in set your own custom “alerts”, on attributes generated by “standard” and “custom” modules.
17. What is the status of Hawkeye? Hawkeye 1.0 Release Candidate 1 (RC1)
Current module library includes modules to monitor system load, users, disk space, Condor, and more
Available from http://cs.wisc.edu/condor/hawkeye