Page 4 of 13
Re: NDO2DB Issue out of the blue
Posted: Thu Aug 20, 2015 4:28 pm
by Box293
rseiwert wrote:I created a check with a large text section to replicate or as someone found out just run checkwmiplus in debug mode will replicate the issue. Debug mode on NDOUtils is also useless in troubleshooting this.
Can I have a copy of that check, I'm keen to get to the bottom of this, at least work out how to improve NDO debugging.
Re: NDO2DB Issue out of the blue
Posted: Thu Aug 20, 2015 4:32 pm
by BanditBBS
There goes Troy, hijacking my thread, tsk tsk, lol
Seriously though...I'm thinking my nagios is stable now with the auto rescheduling off, but my NDO2DB is still not stable....something is making it puke every so often. I believe everything rseiwert said about the debugging not being useful for this...so how am I going to figure out the check causing my issue(if that is whats doing it)
Re: NDO2DB Issue out of the blue
Posted: Thu Aug 20, 2015 5:10 pm
by rseiwert
If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
Re: NDO2DB Issue out of the blue
Posted: Thu Aug 20, 2015 7:25 pm
by BanditBBS
Broke again, so restarted NDO2DB once more.....so now I have this:
Code: Select all
[root@iss-chi-nag09 ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee070002 1409024 nagios 600 100672512 98313
0x8b070002 1474561 nagios 600 131072000 128000
0xff070002 1507330 nagios 600 0 0
We can't function like this much longer, its every 3-5 hours NDO2DB is stopping processing and we have to restart. We're of course losing those queued messages too.
Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 8:05 am
by BanditBBS
My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 10:10 am
by tmcdonald
rseiwert wrote:If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
I would like to reproduce this, but I am curious as to the significance of the "-d -d -d" flags. Is that just for generating long output, or does it generate special characters that might cause NDO to hang? If it's just for long output I've already tested up to 100,000 characters of output with no hangup.
BanditBBS wrote:My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
For the sake of your sanity over the weekend, would the event handler "restart services when the queue fills" band-aid work for you? This NDO bug has been nearly impossible to replicate on our end, so fixing it has been difficult on the best of days.
Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 10:14 am
by BanditBBS
I wish Trevor. I had put in a cronjob to restart ndo every 3 hours last night, but that wasn't even enough and he had to do those 3 restarts of it manually. Plus, we can't monitor that stuff and kick off an event handler, NagiosXI doesn't function when it happens, so it would neither see the queue is filled nor kick off event handler.
Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 11:13 am
by jfrickson
Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:
Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave
DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.
Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.
I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 11:18 am
by BanditBBS
jfrickson wrote:Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:
Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave
DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.
Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.
I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Umm, I understand the gist of what you are asking...but lets pretend I am an idiot when it comes to C(not hard, as I am). I'll gladly do what you are asking, but need better instructions

Re: NDO2DB Issue out of the blue
Posted: Fri Aug 21, 2015 11:42 am
by jfrickson

Ok.
Go to
/usr/local/nagios/bin and enter
cp ndo2db ndo2db.bak.
Go to the directory where ndutils is, then to the
src sub-directory. Enter
cp ndo2db.c ndo2db.c.bak. Edit
ndo2db.c in your editor of choice. Change lines 82-91 from this:
Code: Select all
/*#define DEBUG_NDO2DB 1*/ /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
/*#define DEBUG_NDO2DB2 1*/
/*#define NDO2DB_DEBUG_MBUF 1*/
/*
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
*/
to this:
Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
(i.e. remove most of the
/* and
*/
Save the file, then do:
Shut down the current ndo2db process then
Code: Select all
cd /usr/local/nagios/bin
ndo2db -c /usr/local/nagios/etc/ndo2db.cfg > ../var/ndo2db.log
Wait until it dies. Then
mv ndo2db.bak ndo2db and restart ndo2db. If the ndo2db.log file is huge (it probably will be) you can just send the last hundred lines or so.
Let me know if you have any problems.