Nagios XI Environment Size Suggestions

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
IMTECH
Posts: 53
Joined: Fri Nov 25, 2011 6:35 am

Nagios XI Environment Size Suggestions

Post by IMTECH »

Hi,

are there any recommendations what size (count of hosts/checks) a single Nagios XI environment should not exceed?

The hardware specs are the following:
32 cpu cores/threads (usually at a load of 2-3)
16GB memory (soon upgraded to 64GB, but we only hit the limits when running reports)
plenty of free disk space (ssd for Nagios XI, hdd for OS)

mysql db is not offloaded
gearmand enabled to offload checks to several workers

Thank you!

Kind regards,
Gerhard
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios XI Environment Size Suggestions

Post by tmcdonald »

There are some general recommendations I make based on common hurdles I have seen over the years:
  • 10k checks (host plus services) on a 5-minute interval can be done with less hardware than you have and run just fine.
  • 20k checks will need some optimizations along the lines of offloaded mysql DB, ramdisk, and gearman.
  • 30k checks is achievable, but the issues you will run into can't really be solved by throwing more hardware at the problem. At this point you need to look into tweaking check frequencies, decreasing the historical record retention, and having a somewhat structured procedure for managing configs.
A lot of the issues with adding more checks come down to the time it takes to run reports and generate portions of the interface, and the sheer complexity of all the configs. A single person can't possibly remember details of how 30,000 objects are supposed to correlate and keep it straight without documentation, and several people are likely to step on toes if working simultaneously.
Former Nagios employee
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios XI Environment Size Suggestions

Post by mcapra »

We don't have specific sizing documentation. We've seen servers begin to drag at 7k checks, and have seen servers easily handle upwards of 20k checks. It depends on the plugins used, how the configuration set is written, lots of factors. Here's some documentation regarding tuning/maximizing Nagios XI performance:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... h_NRDS.pdf
https://assets.nagios.com/downloads/nag ... ios_XI.pdf
https://assets.nagios.com/downloads/nag ... giosXI.pdf
https://assets.nagios.com/downloads/nag ... Server.pdf

I would definitely consider offloading MySQL if you're dealing with a large environment.
Former Nagios employee
https://www.mcapra.com/
IMTECH
Posts: 53
Joined: Fri Nov 25, 2011 6:35 am

Re: Nagios XI Environment Size Suggestions

Post by IMTECH »

Thank you for your responses!

We're currently way beyond the 30k checks (mod_gearman heavily used to keep the cpu load at a minimum) to offload 99% of all checks to workers and have over 70 checks executed per second.

We already encountered issues that we could only solve by truncating nagios_logentries (18 million entries that as far as we could find out never use).
So the suggestion would be clearly to split up environments?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios XI Environment Size Suggestions

Post by dwhitfield »

IMTECH wrote: So the suggestion would be clearly to split up environments?
Yes, but there are a variety of ways to do that: https://assets.nagios.com/downloads/gen ... utions.pdf

There is (was?) a customer beta program for the new fusion if you'd like to take a look at that, although fusion by itself is not going to fix this issue. Fusion works with Core though, so you could use Core for some of the checks if that sounds like a better option than two XI servers. That said, the person who runs the customer beta program is out today, so I am not sure the current status of the program. I could check with the lead fusion developer if that would be of interest.
Locked