Nagios IPCS stops processing
Re: Nagios IPCS stops processing
Do you still have access to the ndodb.debug.old log file with the errors? Could you post it to the forum so we can see it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios IPCS stops processing
Even with ndo2db logging set to everything and debug set to verbose all that is in the log is the insert statements. I did have an exchange server going nuts with about 500 errored application log events/hour most of the day. I do have check_wmi_plus checking this and it does return the a list of events in the body of the check. Typically since this is reporting only the last hour this is small and then they are put through the db they and are truncated. I'm assuming that these large check returns were choking ndo2db. I know I should not return large checks but sometimes things do go askew and I would argue we need to harden the system against such chaos. It should NOT simply stop processing and keep showing everything is OK. I just got done wrapping check_wmi_plus in a shell script to truncate it's output before nagios consumes it. I don't know for sure if that was the issue. Of course by the time I knocked the rust off my shell scripting and got it done whatever was causing this issue also was gone.
You do not have the required permissions to view the files attached to this post.
Grumpy Olde IT Guy
Re: Nagios IPCS stops processing
Thanks for the log file. It will help in debugging the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios IPCS stops processing
I have been able to replicate this by recreating the issue. I simply opened up the check to show all event logs for the last couple of days and BAM! started getting queue messages and invalid check results.
I do know that I shouldn't return that much data but I also feel that a little GIGO checking will go a long way to improving system stability.
I do know that I shouldn't return that much data but I also feel that a little GIGO checking will go a long way to improving system stability.
Grumpy Olde IT Guy
Re: Nagios IPCS stops processing
Code: Select all
I researching this issue I found this in the NDOUtils Readme
Code: Select all
***************
!! IMPORTANT !!
***************
This code is still an alpha/beta quality, so expect problems if you intend to use
it. Make sure that you aren't using it with your only production installation of
Nagios, or it could take down the Nagios process if the NDOMOD module segfaults.
Nagios could segfault silently and you might never know that Nagios crashed...
Code: Select all
ndomod-2x.o = NDOMOD module for Nagios 2.x
ndomod-3x.o = NDOMOD module for Nagios 3.x
ndomod-4x.o = NDOMOD module for Nagios 4.x (unstable)
Code: Select all
[root@nagios ~]# ps -ef | grep ndo2db | grep -v grep
nagios 49744 1 0 11:10 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 49778 49744 0 11:10 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 49779 49778 0 11:10 ? 00:00:08 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
[root@nagios ~]# ipcs -q -p
------ Message Queues PIDs --------
msqid owner lspid lrpid
131072 nagios 8348 8349
163841 nagios 49778 49779
[root@nagios ~]# ipcs -q -i 163841
Message Queue msqid=163841
uid=500 gid=500 cuid=500 cgid=500 mode=0600
cbytes=0 qbytes=131072000 qnum=0 lspid=49778 lrpid=49779
send_time=Thu Apr 23 12:41:51 2015
rcv_time=Thu Apr 23 12:41:51 2015
change_time=Thu Apr 23 11:10:38 2015
[root@nagios ~]# ipcs -q -t
------ Message Queues Send/Recv/Change Times --------
msqid owner send recv change
131072 nagios Apr 23 11:04:52 Apr 23 11:04:52 Apr 22 20:17:29
163841 nagios Apr 23 12:41:59 Apr 23 12:41:59 Apr 23 11:10:38
[root@nagios ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xbc000002 131072 nagios 600 0 0
0x92000078 163841 nagios 600 0 0
Grumpy Olde IT Guy
Re: Nagios IPCS stops processing
Working on some Perl code that can dump the queues, just need to look into the NDO code to figure out the message structure. Will update when I make more progress.
Update:
Run the following to install the correct perl module:
then save this as dumpq.pl:
Make sure to chmod +x it. Run ipcs -q to get the id of the full queue, then run the perl program like so:
It should (hopefully) write the contents of a single message to queue_contents.txt, and if they are sane we can see what's in the rest of the queue. They might be ASCII or they might be binary, so post the output file once it runs and we'll see.
If it hangs it means you ran it against an empty queue.
Update:
Run the following to install the correct perl module:
Code: Select all
perl -MCPAN -e 'install IPC::SysV'
Code: Select all
#!/usr/bin/perl
use IPC::SysV;
my $id = $ARGV[0];
msgrcv($id, my $msg, 32000, 1, 0);
print "Message is:\n$msg\nEND OF MESSAGE\n";
Code: Select all
./dumpq.pl [queue id] > queue_contents.txt
If it hangs it means you ran it against an empty queue.
Former Nagios employee
Re: Nagios IPCS stops processing
Hopefully I will not be back to this thread again and the good ship NMS stays upright. If it does I will most certainly share what I find.
Grumpy Olde IT Guy
Re: Nagios IPCS stops processing
As do we.rseiwert wrote:Hopefully I will not be back to this thread again and the good ship NMS stays upright.
Many thanks as always.rseiwert wrote:If it does I will most certainly share what I find.
Have a great weekend!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.