NXI 5.5.2 - High IPCS queues w/ high latency checks

Aezox · Post by **Aezox** » Tue Jan 07, 2020 9:13 am

Hi community !

Our Nagios XI server is running into a really strange behavior since 10 days.
The messages queue are constantly high and do not decrease causing result check to appear far after. The delay response time is 7 minutes after the check command which is way too late for active monitoring.
We have ran several investigations on the server and on the database.
We are not running in any high CPU usage and the database is properly running.

We are running Nagios XI 5.5.2 with Nagios Core 4.2.4 along with 6 Mod Gearman 2 servers. Nagios server is running 15% of all checks and the rest being run by MG servers. Below are the packages we have installed on Nagios server :

nagiosxi-nrds-5.5.2-1.el6.x86_64
nagiosxi-5.5.2-1.el6.x86_64
nagiosxi-wkhtmltox-5.5.2-1.el6.x86_64
nagiosxi-nsca-5.5.2-1.el6.x86_64
nagiosxi-pnp-5.5.2-1.el6.x86_64
nagiosxi-shellinabox-5.5.2-1.el6.x86_64
nagiosxi-nxti-5.5.2-1.el6.x86_64
nagiosxi-nagioscore-5-4.13.el6.x86_64
nagiosxi-nrpe-5.5.2-1.el6.x86_64
nagiosxi-nagiosmobile-5.5.2-1.el6.x86_64
nagiosxi-mrtg-5.5.2-1.el6.x86_64
nagiosxi-nagvis-5.5.2-1.el6.x86_64
nagiosxi-wmic-5.5.2-1.el6.x86_64
nagiosxi-ndoutils-5.5.2-1.el6.x86_64
nagiosxi-nagiosplugins-5.5.2-1.el6.x86_64
gearmand-server-0.33-2.x86_64
gearmand-0.33-2.x86_64
gearmand-devel-0.33-2.x86_64
mod_gearman2-2.1.1-1.el6.x86_64

We are running with 7,600 active hosts and 35,206 active services.
Our MySQL DB is ingesting 20,000 query per seconds.
IPCS queues stall above 500,000 messages.

We are running with MySQL version 5.1.73-8 on the same server as Nagios.
'mysqld.log' is not showing any errors.
We are aligned with all recommended performance configuration from Nagios documentation.

I've search the forum but I cannot find any similar issue, if any one has a clue

Thanks in advance for your help

Post by **mbellerue** » Tue Jan 07, 2020 1:45 pm

The very first question: Is this a physical server? We recommend going to physical servers once you reach about 20,000 checks.

It looks like you're running mod gearman. How many workers do you have running? Are they all configured the same, and what are their max jobs set to? Also how many checks are run through gearman vs run from Nagios? If you have a lot of checks running from Nagios, could they be off-loaded to gearman? Or some active checks turned into passive checks?

You might also consider off-loading MySQL to its own box.
https://assets.nagios.com/downloads/nag ... Server.pdf

Another quick trick is modifying the check-host-alive command. This is the default host check. It just pings each host to see if it's up. The trick is that it sends 5 pings. In your environment, that's 7,600 checks that are sticking around for 5 seconds. If there are a lot of hosts that are on the LAN, you could have an alternate check-host-alive command that only sends 3 pings.

Aezox · Post by **Aezox** » Wed Jan 08, 2020 5:47 am

Thanks for your reply @mbellerue
We are running Nagios on a VM hosted on physical ESX servers.
We have 6 running workers all configured the same way and 85 % of the global load is handled by the workers.

Off-loading MySQL DB is not in our primary concerns because this configuration with this amount of load runs since a year without any problem.

Regarding check-host-alive we will dig into this trick to tune it, thanks.

On the other hand we have partially solved the problem : We have found out that the problem is located on the ESX farm. Some ESX hosts generates huge impacts on Nagios behavior. So this problem is out of Nagios scope

Thanks anyway for the provided answers

Post by **lmiltchev** » Wed Jan 08, 2020 10:11 am

@Aezox. do you have any further questions or it's OK to close this topic? Thanks!

Aezox · Post by **Aezox** » Wed Jan 08, 2020 11:00 am

@lmiltchev ticket can be closed
Thanks

Nagios Support Forum

NXI 5.5.2 - High IPCS queues w/ high latency checks

NXI 5.5.2 - High IPCS queues w/ high latency checks

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks