1 / 49

A Deep Dive into Nagios Analytics

A Deep Dive into Nagios Analytics. Alexis Lê-Quôc (@alq) http://datadoghq.com. @alq Dev & Ops Nagios user since 2008 Datadog co-founder. A little survey. Top 3 failed checks. That I responded to last week. That woke me up. That most of my team responded to at least once.

Download Presentation

A Deep Dive into Nagios Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Deep Dive into Nagios Analytics • Alexis Lê-Quôc (@alq) • http://datadoghq.com

  2. @alq Dev & Ops Nagios user since 2008 Datadog co-founder

  3. A little survey

  4. Top 3 failed checks

  5. That I responded to last week That woke me up That most of my team responded to at least once That impacts our business the most? That I responded to 5 weeks ago Top 3 failed checks

  6. That I responded to last week That woke me up That most of my team responded to at least once That impacts our business the most? That I responded to 5 weeks ago Top 3 failed checks

  7. At best, finding local optimums At worst, brownian motion Using memory to prioritize remediation...

  8. Analytics

  9. Performance Metrics Nagios Traffic Other Sources In the “Cloud”

  10. Nagios a “chatty” sourceout of 40+ Datadog supports

  11. One example

  12. Almost 13000 Nagios “events” over past week

  13. Constant stream

  14. 86 notifications!

  15. Pattern

  16. Pattern

  17. More data?More questions.

  18. Not a scientific study A dialog with data

  19. 25% 50% 75% 100% 20 93 322 904 Population

  20. Does size matter?

  21. Weekly Count per host split by quartile

  22. Outliers Sick hosts, silenced checks Weekly count per host split by quartile

  23. Notifications

  24. 1-3% of alerts notify Little difference per quartile Notifications

  25. Does time of day matter?

  26. Mean about the same across quartiles Time-based deviation?

  27. Does the day of week matter?

  28. Not really

  29. Squeaky wheels? (checks)

  30. Outlier

  31. Outlier in more detail

  32. Long Tail

  33. Squeaky wheel?(hosts)

  34. Same outlier

  35. Similar pattern as checks

  36. Long Tail

  37. Recurring alerts

  38. Young Old Seldom happens Happens Often

  39. Happen once in a while Occur often, for a long time Tolerated

  40. More data?More questions.

  41. HOWTO?

  42. Awk Postgres R d3 Find out tomorrow!

  43. Presentation matters

  44. Take-away?

  45. Take-aways • Don’t rely on your memory • Your Nagios logs are a treasure trove • Have a dialog with your data • Presentation matters

More Related