Hi. Does the attached status information indicate that our server was processing a large number of checks simultaneously? Is there a reference doc on how to ensure that they don't bunch up?
Thanks!
Nagios Server Performance Slowdown Evidence
Nagios Server Performance Slowdown Evidence
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios Server Performance Slowdown Evidence
@awilson, The dashlet shows approximately similar average number of hosts and services executed during the last 5 and 15 minutes. It looks like Nagios is processing a large number of checks every minute but check times and load are spread out equally.
You can limit the number of service checks running at any given time by changing the below command in the /usr/local/nagios/etc/nagios.cfg file:
And set it to the max number of checks that can be running. I'd not recommend doing this though.
How many hosts and services are monitored by this Nagios server?
Also, I'm seeing that some services or some service is taking 60 seconds to execute. That's quite a large delay and it could be affecting your system. If this check is running every minute and hanging for 60 seconds, that could affect the system load.
We do have a script that you can run from the command line and it will tell you how long it takes for nagios to execute each service. You can run it in order to figure out which service takes 60 seconds to run.
https://exchange.nagios.org/directory/P ... me/details
You can limit the number of service checks running at any given time by changing the below command in the /usr/local/nagios/etc/nagios.cfg file:
Code: Select all
max_concurrent_checksHow many hosts and services are monitored by this Nagios server?
Also, I'm seeing that some services or some service is taking 60 seconds to execute. That's quite a large delay and it could be affecting your system. If this check is running every minute and hanging for 60 seconds, that could affect the system load.
We do have a script that you can run from the command line and it will tell you how long it takes for nagios to execute each service. You can run it in order to figure out which service takes 60 seconds to run.
https://exchange.nagios.org/directory/P ... me/details
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Server Performance Slowdown Evidence
The counts are:
Total Hosts: 1050
Total Services: 4848
Nagios XI - System Info
System:
Nagios XI Version : 5.4.4
lbschpnagxi00.fossil.com 2.6.32-754.2.1.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.10 (Santiago)
The system is a VM
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Stepping: 7
CPU MHz: 2600.000
BogoMIPS: 5200.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-3
12G RAM
I'll take a look at the execution times for the service checks.
Total Hosts: 1050
Total Services: 4848
Nagios XI - System Info
System:
Nagios XI Version : 5.4.4
lbschpnagxi00.fossil.com 2.6.32-754.2.1.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.10 (Santiago)
The system is a VM
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Stepping: 7
CPU MHz: 2600.000
BogoMIPS: 5200.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-3
12G RAM
I'll take a look at the execution times for the service checks.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios Server Performance Slowdown Evidence
@awilson, The hardware specs you listed should be sufficient for the number of hosts and services you're monitoring. And looking at the dashboard you provided, it doesn't seem that Nagios is processing an untypical number of checks at the same time, compared to the total amount of checks on your server.
If you believe your server has slowed down, consider implementing a ramdisk. We have an automated script that handles the whole installation.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Check to see that there are no duplicate nagios processes running or multiple ipcs queues:
Another thing you can do is to increase the check interval for services. That way Nagios will have to execute fewer checks per minute and that will improve the system load.
Here are some additional recommendations for large XI installations:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
If you believe your server has slowed down, consider implementing a ramdisk. We have an automated script that handles the whole installation.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Check to see that there are no duplicate nagios processes running or multiple ipcs queues:
Code: Select all
ipcs -q
ps -ef | grep nagios.cfgHere are some additional recommendations for large XI installations:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Server Performance Slowdown Evidence
Thanks. You can close the post.