1 / 29

DSM Scalability Considerations for Unicenter NSM r11

DSM Scalability Considerations for Unicenter NSM r11. Last Updated June 5 2006. Best Practice Summary – see notes. 50k local objects polled in one DSM is fine for r11 Manage polling to not exceed 600 polls per second Must configure –m parameter to allow this load

xenos
Download Presentation

DSM Scalability Considerations for Unicenter NSM r11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DSM Scalability Considerations for Unicenter NSM r11 Last Updated June 5 2006

  2. Best Practice Summary – see notes • 50k local objects polled in one DSM is fine for r11 • Manage polling to not exceed 600 polls per second • Must configure –m parameter to allow this load • We encourage managing poll cycle use avg >20% and <50% of poll time window • More than 100 DSMs can report to one MDB

  3. Detailed DSM Performance

  4. Objectives • Understand issues affecting DSM performance • Understand issues affecting scalability • Consider architectural options • Recommendations

  5. Issues affecting DSM performance

  6. Understand issues affecting DSM performance • Hardware • Local vs remote DSM(s) • Cold start vs. warm start • Electronic proximity to hosts • Network configuration and congestion • Number of hosts • Number of managed objects • Polling configuration

  7. Hardware • See Hardware Requirements in NSM r11 Implementation Guide for latest guidance

  8. Hardware • Does hardware matter? • 30,000 objects ~= 2 subnets with 50 objects per host

  9. Local vs remote DSM(s) • For smaller implementations a local DSM on the MDB machine is OK • For larger implementations, remote DSM(s) should be strongly considered • DSM should be electronically close to what it polls and may connect to a remote MDB

  10. Local vs remote DSM(s)

  11. Multiple Remote DSMs • Multiple remote DSMs have a synergistic effect

  12. Local vs remote DSM(s) • Local and remote DSM not as strong

  13. Cold start vs. warm start • Set “WarmStart=yes” option in %AGENTWORKS_DIR%\services\config\atmanager.ini • Warm start uses previously discovered objects • Reduces MDB access time • Reduces discovery process time • Must still confirm status

  14. Cold start vs. warm start • Startup measured as time to DSM settling DSM start complete

  15. Cold start vs. warm start • Startup elapsed times

  16. Electronic proximity to hosts • Standard best practice not more than 3 hops • High performance LAN access to hosts and MDB • Avoid WAN links • Given a choice, put a DSM close to what it polls, instead of close to its MDB • Missed traps is in indication of excessive load or network busy – reduce distance of polling/traps

  17. LAN Polling

  18. Network configuration and congestion • DSM should usually handle whole subnets • Fast/stable path to MDB • Network utilization • Errors, timeouts, and retries • Missed traps must be addressed • Poll cycle must have free time for lead peaking • Size counts

  19. WAN Polling

  20. Number of hosts • Affects startup and first stage discovery • Affects total DSM object population • Affects DSM host configuration

  21. Number of objects • Each managed host may spawn dozens of objects • Agents • Watchers • Split DSMs to keep number of objects constrained • Split DSMs to keep electronically close • Obrowser and query with no argument displays objects – actual polled objects usually is fewer

  22. Polling configuration – see notes • Polling interval • Polling rate for r11 DSM sustained at up to 1,000 polls/second (laboratory only – do not exceed 600) • Speeds discovery (?) • Not needed for status polling • 10 to 20 minutes polling still best practice • 50,000 poll-able objects at 10 minute polling interval is about 80 polls/second • Timeouts are critical • Assume timeout 10, retry 2 = 30 second delay • DSM thread waits for reply or timeout on SNMPGET • IP policy makes extensive use of SNMPGET

  23. Polling configuration • Calculating polling rates • Target no more than 50% MaxPollRate utilization and no less than 20% MaxPollRate utilization • 200/sec: five minute interval is 300 seconds so do not attempt more than 30k polls in five minute interval (300 seconds X .50 X 200 polls per second) = 30k objects polled every 5 minutes • Configure [aws_snmp] MaxPollRate in atservices.ini

  24. Issues affecting scalability

  25. Issues affecting scalability • Hardware • What hardware is available? • Can it support MDB + DSM? • Network • How electronically close are managed objects? • Is there capacity to handle polling and trap traffic? • How reliable is the network? • Geographic proximity • Do managed objects exist on other side of WAN? • Polling • What are the polling requirements?

  26. Issues affecting scalability • Type of host activity • Web server • Application server • Database server • Batch server

  27. Architectural options

  28. Architectural Options • Local DSM • Fine for smaller shops • Add remote DSMs as necessary • Add remote DSMs to improve performance • Use several smaller DSMs • Closer to managed objects (most important tuning choice!) • Faster startup • More robust (not single point of failure) • Reduces effect of an outage • Bridged MDBs • Distribute MDBs for better DSM access – not critical unless bandwidth to MDB limited and high update activity

  29. Questions?

More Related