Page 1 of 1

Services missing from Nagios XI

Posted: Thu Nov 15, 2018 2:19 pm
by daniel.ledford
Today I have noticed that our XI system is not reflecting the actual amount of services that we actually have. It is a big difference. Right now it shows about 2300 fluctuating services, but we actually have about 50,000 services. To my knowledge nothing major has changed on the box from a day ago when it was not having this problem.

Here are some results from the local mysql database:

mysql> select * from nagios.nagios_services;
Empty set (0.00 sec)

mysql> select * from nagios.nagios_hosts;
Empty set (0.00 sec)

mysql> select count(*) from nagiosql.tbl_host;
+----------+
| count(*) |
+----------+
| 8841 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from nagiosql.tbl_service;
+----------+
| count(*) |
+----------+
| 55445 |
+----------+
1 row in set (0.00 sec)

I have tried restarting the server itself, the main nagios process, and the ndo2db process, but nothing has helped.

Here is a quick look at the message queue

-bash-4.1$ ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xbe05801f 524288 nagios 600 221273088 216087

As a side note there does not seem to be any problem showing the hosts, only services

Re: Services missing from Nagios XI

Posted: Thu Nov 15, 2018 2:22 pm
by daniel.ledford
Attached the system Profile.

This is a RedHat 6 virtual server.

Re: Services missing from Nagios XI

Posted: Thu Nov 15, 2018 3:16 pm
by benjaminsmith
Hi @daniel.ledford

You may have multiple processes running, and I noticed some crashed database tables in the error log. Let's try the following:

Stop Nagios and clear the queues:

Code: Select all

service nagios stop
service ndo2db stop
killall -9 nagios
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
Run the repair database script:

Code: Select all

cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
Restart Nagios XI:

Code: Select all

service ndo2db start
service nagios start
Let me know if this resolve the issue for you.

References:
NDOUtils - Message Queue Exceeded
https://support.nagios.com/kb/article/n ... d-139.html
Nagios XI - Crashed Database Tables
https://support.nagios.com/kb/article/n ... es-24.html

Re: Services missing from Nagios XI

Posted: Fri Nov 16, 2018 11:09 am
by daniel.ledford
The problem corrected itself yesterday afternoon, but has come back this morning. I tried your steps but they have not resolved the issue either.
I am also noticing that our when our queue is filling up that nagios stop processing checks, and the log goes stale.

------ Message Queues --------
key msqid owner perms used-bytes messages
0x8b000002 1703936 nagios 600 1310718976 1279999

We have always had it run hot and fill up, but nagios would keep processing checks. This is now starting to look like a quiet crash/failure as there are no errors in the nagios log. The only message that shows up in the system message log is the queue is full errors. There is also no errors in the mysqld.log

We did about two weeks ago upgrade our Nagios XI from 5.2.3 to 5.5.3

Could that be a factor?

Re: Services missing from Nagios XI

Posted: Fri Nov 16, 2018 2:20 pm
by benjaminsmith
Hi Daniel,

Your server is under high load due to the large number of hosts and services you have, and this is why the kernel message que is building up.

I would start by upgrading to the latest version, Nagios 5.5.7 to take advantage of recent performance improvements.
https://assets.nagios.com/downloads/nag ... ctions.pdf

You can increase the kernel settings to allow more messages to be queued and processed (see: https://support.nagios.com/kb/article/n ... d-139.html )

You can increase the following parameters by a factor of 4:

Code: Select all

kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.msgmni = 256000
Maximizing Performance In Nagios XI
https://assets.nagios.com/downloads/nag ... ios-XI.pdf