NXI 5.5.2 - High IPCS queues w/ high latency checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Aezox
Posts: 25
Joined: Fri Feb 09, 2018 9:31 am

NXI 5.5.2 - High IPCS queues w/ high latency checks

Post by Aezox »

Hi community !

Our Nagios XI server is running into a really strange behavior since 10 days.
The messages queue are constantly high and do not decrease causing result check to appear far after. The delay response time is 7 minutes after the check command which is way too late for active monitoring.
We have ran several investigations on the server and on the database.
We are not running in any high CPU usage and the database is properly running.

We are running Nagios XI 5.5.2 with Nagios Core 4.2.4 along with 6 Mod Gearman 2 servers. Nagios server is running 15% of all checks and the rest being run by MG servers. Below are the packages we have installed on Nagios server :
nagiosxi-nrds-5.5.2-1.el6.x86_64
nagiosxi-5.5.2-1.el6.x86_64
nagiosxi-wkhtmltox-5.5.2-1.el6.x86_64
nagiosxi-nsca-5.5.2-1.el6.x86_64
nagiosxi-pnp-5.5.2-1.el6.x86_64
nagiosxi-shellinabox-5.5.2-1.el6.x86_64
nagiosxi-nxti-5.5.2-1.el6.x86_64
nagiosxi-nagioscore-5-4.13.el6.x86_64
nagiosxi-nrpe-5.5.2-1.el6.x86_64
nagiosxi-nagiosmobile-5.5.2-1.el6.x86_64
nagiosxi-mrtg-5.5.2-1.el6.x86_64
nagiosxi-nagvis-5.5.2-1.el6.x86_64
nagiosxi-wmic-5.5.2-1.el6.x86_64
nagiosxi-ndoutils-5.5.2-1.el6.x86_64
nagiosxi-nagiosplugins-5.5.2-1.el6.x86_64
gearmand-server-0.33-2.x86_64
gearmand-0.33-2.x86_64
gearmand-devel-0.33-2.x86_64
mod_gearman2-2.1.1-1.el6.x86_64
We are running with 7,600 active hosts and 35,206 active services.
Our MySQL DB is ingesting 20,000 query per seconds.
IPCS queues stall above 500,000 messages.

We are running with MySQL version 5.1.73-8 on the same server as Nagios.
'mysqld.log' is not showing any errors.
We are aligned with all recommended performance configuration from Nagios documentation.

I've search the forum but I cannot find any similar issue, if any one has a clue :)
Thanks in advance for your help
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Post by mbellerue »

The very first question: Is this a physical server? We recommend going to physical servers once you reach about 20,000 checks.

It looks like you're running mod gearman. How many workers do you have running? Are they all configured the same, and what are their max jobs set to? Also how many checks are run through gearman vs run from Nagios? If you have a lot of checks running from Nagios, could they be off-loaded to gearman? Or some active checks turned into passive checks?

You might also consider off-loading MySQL to its own box.
https://assets.nagios.com/downloads/nag ... Server.pdf

Another quick trick is modifying the check-host-alive command. This is the default host check. It just pings each host to see if it's up. The trick is that it sends 5 pings. In your environment, that's 7,600 checks that are sticking around for 5 seconds. If there are a lot of hosts that are on the LAN, you could have an alternate check-host-alive command that only sends 3 pings.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Aezox
Posts: 25
Joined: Fri Feb 09, 2018 9:31 am

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Post by Aezox »

Thanks for your reply @mbellerue
We are running Nagios on a VM hosted on physical ESX servers.
We have 6 running workers all configured the same way and 85 % of the global load is handled by the workers.

Off-loading MySQL DB is not in our primary concerns because this configuration with this amount of load runs since a year without any problem.

Regarding check-host-alive we will dig into this trick to tune it, thanks.

On the other hand we have partially solved the problem : We have found out that the problem is located on the ESX farm. Some ESX hosts generates huge impacts on Nagios behavior. So this problem is out of Nagios scope :)

Thanks anyway for the provided answers
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Post by lmiltchev »

@Aezox. do you have any further questions or it's OK to close this topic? Thanks!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Aezox
Posts: 25
Joined: Fri Feb 09, 2018 9:31 am

Re: NXI 5.5.2 - High IPCS queues w/ high latency checks

Post by Aezox »

@lmiltchev ticket can be closed
Thanks
Locked