Page 1 of 1

Nagios Server Performance Slowdown Evidence

Posted: Fri May 17, 2019 1:08 pm
by awilson
Hi. Does the attached status information indicate that our server was processing a large number of checks simultaneously? Is there a reference doc on how to ensure that they don't bunch up?

Thanks!

Re: Nagios Server Performance Slowdown Evidence

Posted: Fri May 17, 2019 2:58 pm
by npolovenko
@awilson, The dashlet shows approximately similar average number of hosts and services executed during the last 5 and 15 minutes. It looks like Nagios is processing a large number of checks every minute but check times and load are spread out equally.
You can limit the number of service checks running at any given time by changing the below command in the /usr/local/nagios/etc/nagios.cfg file:

Code: Select all

max_concurrent_checks
And set it to the max number of checks that can be running. I'd not recommend doing this though.

How many hosts and services are monitored by this Nagios server?

Also, I'm seeing that some services or some service is taking 60 seconds to execute. That's quite a large delay and it could be affecting your system. If this check is running every minute and hanging for 60 seconds, that could affect the system load.

We do have a script that you can run from the command line and it will tell you how long it takes for nagios to execute each service. You can run it in order to figure out which service takes 60 seconds to run.
https://exchange.nagios.org/directory/P ... me/details

Re: Nagios Server Performance Slowdown Evidence

Posted: Mon May 20, 2019 3:32 pm
by awilson
The counts are:
Total Hosts: 1050
Total Services: 4848

Nagios XI - System Info
System:
Nagios XI Version : 5.4.4
lbschpnagxi00.fossil.com 2.6.32-754.2.1.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.10 (Santiago)

The system is a VM
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Stepping: 7
CPU MHz: 2600.000
BogoMIPS: 5200.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-3

12G RAM

I'll take a look at the execution times for the service checks.

Re: Nagios Server Performance Slowdown Evidence

Posted: Mon May 20, 2019 3:55 pm
by npolovenko
@awilson, The hardware specs you listed should be sufficient for the number of hosts and services you're monitoring. And looking at the dashboard you provided, it doesn't seem that Nagios is processing an untypical number of checks at the same time, compared to the total amount of checks on your server.
If you believe your server has slowed down, consider implementing a ramdisk. We have an automated script that handles the whole installation.
https://assets.nagios.com/downloads/nag ... giosXI.pdf

Check to see that there are no duplicate nagios processes running or multiple ipcs queues:

Code: Select all

ipcs -q
ps -ef | grep nagios.cfg
Another thing you can do is to increase the check interval for services. That way Nagios will have to execute fewer checks per minute and that will improve the system load.

Here are some additional recommendations for large XI installations:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Re: Nagios Server Performance Slowdown Evidence

Posted: Wed Jun 12, 2019 10:41 am
by awilson
Thanks. You can close the post.