Page 1 of 1

ndo2db hangs

Posted: Thu Jun 14, 2018 4:02 pm
by bchabotdg
I'm seeing frequent log entries on my Nagios XI server from ndo2db:

Code: Select all

Jun 14 16:56:18 ip-10-35-32-9 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 32768 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Jun 14 16:56:38 ip-10-35-32-9 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Jun 14 16:56:38 ip-10-35-32-9 ndo2db: Warning: queue send error, retrying...
This results in no checks happening till ndo2db comes back (sometimes it doesn't) or is restarted. When it comes back on it's own it overloads the server I get a slew of these errors:

Code: Select all

Jun 14 16:52:01 ip-10-35-32-9 nagios: #011Max concurrent service checks (2000) has been reached.  Nudging $CHECK_NAME
This setup uses mod_gearman and three remote gearman workers without any restrictions as to hostgroups or servicegroups. The remote workers sit idle then ndo2db is dead.

What can be done to fix this?

Re: ndo2db hangs

Posted: Thu Jun 14, 2018 4:12 pm
by scottwilkerson
Here is a document on increasing the values and tuning the message queue
https://support.nagios.com/kb/article.php?id=139

Re: ndo2db hangs

Posted: Tue Jun 19, 2018 8:42 am
by bchabotdg
I've re-adjusted the queue sizes as suggested in that doc.

No joy.

I still see these repeated:

Code: Select all

Jun 19 09:36:41 SERVER ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Jun 19 09:36:41 SERVER ndo2db: Warning: queue send error, retrying...
Jun 19 09:37:01 SERVER ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Jun 19 09:37:01 SERVER ndo2db: Warning: queue send error, retrying...
Settings are currently:

Code: Select all

kernel.msgmnb = 1262144000
kernel.msgmax = 1262144000
kernel.msgmni = 512000

Code: Select all

# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xd5010002 0          nagios     600        1262143488   1232562

#

Re: ndo2db hangs

Posted: Tue Jun 19, 2018 8:53 am
by scottwilkerson
Is the MySQL service running?

Can you send a system profile Admin -> System Profile

Re: ndo2db hangs

Posted: Tue Jun 19, 2018 9:06 am
by bchabotdg

Code: Select all

# service mysqld status
mysqld (pid  8547) is running...
#
Profile attached:
profile.zip

Re: ndo2db hangs

Posted: Tue Jun 19, 2018 9:53 am
by scottwilkerson
I notices in the profile that there were database error that may have just been corrected, and also that there are multiple copies of the dbmaint log running.

If this is taking a really long time to complete it could be causing your issues as there are some optimizes in it that would lock tables.

Here are my suggestions

Edit this line in /etc/cron.d/nagiosxi

Code: Select all

*/5 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php >> /usr/local/nagiosxi/var/dbmaint.log 2>&1
to

Code: Select all

0 1 * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php >> /usr/local/nagiosxi/var/dbmaint.log 2>&1
so it only runs 1 time per day

Then run the following to stop any current processes

Code: Select all

killall -9 dbmaint.php
service mysqld restart