Nagios IPCS stops processing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Nagios IPCS stops processing

Post by rseiwert »

Still am having an issue where XI stops updating but Core functions. If the message queues keep growing I'm wondering which process is supposed to be processing these messages?

Code: Select all

[root@nagios var]# ipcs -q
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x67000002 0          nagios     600        41368576     40399
Grumpy Olde IT Guy
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios IPCS stops processing

Post by abrist »

Usually ndo2db. I have been trying to figure out a good way to view what is in those messages in the queue. If you know of a good way to do so, do tell.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: Nagios IPCS stops processing

Post by rseiwert »

It indeed is something with ndo2db.

Code: Select all

[root@nagios var]# ps -ef | grep ndo2db
nagios    1604     1  0 Apr07 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    1863  1604  0 Apr07 ?        00:00:10 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    1864  1863 11 Apr07 ?        02:53:32 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
If you notice the 4th column is 11 which is rather high. Killing off the ndo2db process and restarting cleared the queue immediately.
Enabled debug logging in ndo2db.cfg for now and hopefully will catch whatever is crashing XI soon.
Grumpy Olde IT Guy
User avatar
lmiltchev
Former Nagios Staff
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios IPCS stops processing

Post by lmiltchev »

Post the debug file, along with the ndo2db.cfg when you are ready (hide sensitive info).
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: Nagios IPCS stops processing

Post by rseiwert »

Still awaiting it to crash again. XI has been working fine lately. We will see if that lasts the week.
Grumpy Olde IT Guy
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Nagios IPCS stops processing

Post by cmerchant »

Hope for two things - that is does keep working, and if it stops we can catch the illusive ipcs queue bug. Keep us posted. Thanks.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: Nagios IPCS stops processing

Post by rseiwert »

XI has been stable for awhile now. I'm sure I had a check going nuts somewhere but with XI crashed I couldn't tell what it was.

Could someone please make a feature request for me that the ndo2db check in sysstat.php checks that the inter-process communication message queue is being processed. A backed up ipc queue would indicate a hung/choked ndo2db process for one reason or another. If ndo2db is not processing these messages then XI is not updating. When I have experienced this issue the system health in XI continues to show green and that ndo2db is running. It seems to me sysstat.php should check that the process is running, is actually running ndo2db, and is functional either via some heartbeat and/or checking the message queues (ipcs -q). If there are more than ??? 100 messages in the queue go red. I'm really not sure what is an acceptable number of messages but I know mine is normally at zero but when non-functional was over 40 thousand.

Before anyone suggests building a nagios check to watch this remember that XI doesn't update when this is down. This is why sysstat.php needs to check for this.

Feel free to close this.
Last edited by rseiwert on Thu Apr 16, 2015 11:29 am, edited 1 time in total.
Grumpy Olde IT Guy
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: Nagios IPCS stops processing

Post by rseiwert »

Actually could someone post this as a feature request.
Grumpy Olde IT Guy
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios IPCS stops processing

Post by tmcdonald »

I can make that request for you.

Are you still seeing or able to reproduce the kernel queue filling up? I think I might have a way to peek into the message queue, but I would need to do it on a live system that is exhibiting the behavior and we have not been able to reproduce this in-house.
Former Nagios employee
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: Nagios IPCS stops processing

Post by rseiwert »

Just did it again today. NDODB is choked. Looing at the ndodb.debug there is nothing there but looking in ndodb.debug.old I did see some checks that seemed to be return huge amounts of data. Of course once in the DB they are truncated but in the processing queue I wonder. I'm looking at the application log check and seeing if I can limit / truncate the results. Also going to try to look at ndo2db and see what I can see. Leaving the system crashed for now to troubleshoot, but will need to restart soon.

Code: Select all

[root@nagios libexec]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x67000002 0          nagios     600        0            0
0xa5000002 65537      nagios     600        21795840     21285

[root@nagios libexec]# ps -ef | grep ndo2db | grep -v grep
nagios   50309 61487  0 Apr16 ?        00:00:44 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   50310 50309  2 Apr16 ?        02:21:43 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   61487     1  0 Apr08 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
Of course as I was writing this it manage to get paste it's choke point and continue on.
Grumpy Olde IT Guy
Locked