Nagios Support Forum

Posted: **Mon Apr 20, 2015 1:33 pm**

Do you still have access to the ndodb.debug.old log file with the errors? Could you post it to the forum so we can see it?

Posted: **Mon Apr 20, 2015 5:46 pm**

Even with ndo2db logging set to everything and debug set to verbose all that is in the log is the insert statements. I did have an exchange server going nuts with about 500 errored application log events/hour most of the day. I do have check_wmi_plus checking this and it does return the a list of events in the body of the check. Typically since this is reporting only the last hour this is small and then they are put through the db they and are truncated. I'm assuming that these large check returns were choking ndo2db. I know I should not return large checks but sometimes things do go askew and I would argue we need to harden the system against such chaos. It should NOT simply stop processing and keep showing everything is OK. I just got done wrapping check_wmi_plus in a shell script to truncate it's output before nagios consumes it. I don't know for sure if that was the issue. Of course by the time I knocked the rust off my shell scripting and got it done whatever was causing this issue also was gone.

Posted: **Tue Apr 21, 2015 2:22 pm**

Thanks for the log file. It will help in debugging the issue.

Posted: **Wed Apr 22, 2015 5:07 pm**

I have been able to replicate this by recreating the issue. I simply opened up the check to show all event logs for the last couple of days and BAM! started getting queue messages and invalid check results.

I do know that I shouldn't return that much data but I also feel that a little GIGO checking will go a long way to improving system stability.

Posted: **Thu Apr 23, 2015 11:48 am**

Code: Select all

Someone (tmcdonald) mentioned that they had a way of inspecting these messages. I would like to know so that this could be used when the problem is occurring. Also if anyone can tell me more about the NDO processes, like why there are three and what each one does.

I researching this issue I found this in the NDOUtils Readme

Code: Select all

***************
!! IMPORTANT !!
***************
This code is still an alpha/beta quality, so expect problems if you intend to use
it.  Make sure that you aren't using it with your only production installation of
Nagios, or it could take down the Nagios process if the NDOMOD module segfaults.
Nagios could segfault silently and you might never know that Nagios crashed...

later in the document

Code: Select all

        ndomod-2x.o = NDOMOD module for Nagios 2.x
        ndomod-3x.o = NDOMOD module for Nagios 3.x
        ndomod-4x.o = NDOMOD module for Nagios 4.x (unstable)

Some IPCS stuff I found out. If you stop and start or crash and restart the old message queue will persist. You can use the -p to figure out which one is relevant and which process ID is pitching and which one is catching. I'm beginning to think that it is the MySQL that is blocking the third NDO2DB process which is backing up the queue.

Code: Select all

[root@nagios ~]# ps -ef | grep ndo2db | grep -v grep
nagios   49744     1  0 11:10 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   49778 49744  0 11:10 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   49779 49778  0 11:10 ?        00:00:08 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
[root@nagios ~]# ipcs -q -p

------ Message Queues PIDs --------
msqid      owner      lspid      lrpid
131072   nagios       8348      8349
163841   nagios      49778     49779

[root@nagios ~]# ipcs -q -i 163841

Message Queue msqid=163841
uid=500 gid=500 cuid=500        cgid=500        mode=0600
cbytes=0        qbytes=131072000        qnum=0  lspid=49778     lrpid=49779
send_time=Thu Apr 23 12:41:51 2015
rcv_time=Thu Apr 23 12:41:51 2015
change_time=Thu Apr 23 11:10:38 2015

[root@nagios ~]# ipcs -q -t

------ Message Queues Send/Recv/Change Times --------
msqid    owner      send                 recv                 change
131072   nagios     Apr 23 11:04:52      Apr 23 11:04:52      Apr 22 20:17:29
163841   nagios     Apr 23 12:41:59      Apr 23 12:41:59      Apr 23 11:10:38

[root@nagios ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xbc000002 131072     nagios     600        0            0
0x92000078 163841     nagios     600        0            0

Posted: **Thu Apr 23, 2015 12:16 pm**

Working on some Perl code that can dump the queues, just need to look into the NDO code to figure out the message structure. Will update when I make more progress.

Update:

Run the following to install the correct perl module:

Code: Select all

perl -MCPAN -e 'install IPC::SysV'

then save this as dumpq.pl:

Code: Select all

#!/usr/bin/perl

use IPC::SysV;

my $id = $ARGV[0];

msgrcv($id, my $msg, 32000, 1, 0);

print "Message is:\n$msg\nEND OF MESSAGE\n";

Make sure to chmod +x it. Run ipcs -q to get the id of the full queue, then run the perl program like so:

Code: Select all

./dumpq.pl [queue id] > queue_contents.txt

It should (hopefully) write the contents of a single message to queue_contents.txt, and if they are sane we can see what's in the rest of the queue. They might be ASCII or they might be binary, so post the output file once it runs and we'll see.

If it hangs it means you ran it against an empty queue.

Posted: **Thu Apr 23, 2015 1:23 pm**

Hopefully I will not be back to this thread again and the good ship NMS stays upright. If it does I will most certainly share what I find.

Posted: **Thu Apr 23, 2015 5:20 pm**

rseiwert wrote:Hopefully I will not be back to this thread again and the good ship NMS stays upright.

As do we.

rseiwert wrote:If it does I will most certainly share what I find.

Many thanks as always.
Have a great weekend!

Nagios Support Forum

Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing

Re: Nagios IPCS stops processing