Page 4 of 13

Re: NDO2DB Issue out of the blue

Posted: Thu Aug 20, 2015 4:28 pm
by Box293
rseiwert wrote:I created a check with a large text section to replicate or as someone found out just run checkwmiplus in debug mode will replicate the issue. Debug mode on NDOUtils is also useless in troubleshooting this.
Can I have a copy of that check, I'm keen to get to the bottom of this, at least work out how to improve NDO debugging.

Re: NDO2DB Issue out of the blue

Posted: Thu Aug 20, 2015 4:32 pm
by BanditBBS
There goes Troy, hijacking my thread, tsk tsk, lol :)

Seriously though...I'm thinking my nagios is stable now with the auto rescheduling off, but my NDO2DB is still not stable....something is making it puke every so often. I believe everything rseiwert said about the debugging not being useful for this...so how am I going to figure out the check causing my issue(if that is whats doing it)

Re: NDO2DB Issue out of the blue

Posted: Thu Aug 20, 2015 5:10 pm
by rseiwert
If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode

Re: NDO2DB Issue out of the blue

Posted: Thu Aug 20, 2015 7:25 pm
by BanditBBS
Broke again, so restarted NDO2DB once more.....so now I have this:

Code: Select all

[root@iss-chi-nag09 ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xee070002 1409024    nagios     600        100672512    98313
0x8b070002 1474561    nagios     600        131072000    128000
0xff070002 1507330    nagios     600        0            0
We can't function like this much longer, its every 3-5 hours NDO2DB is stopping processing and we have to restart. We're of course losing those queued messages too.

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 8:05 am
by BanditBBS
My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 10:10 am
by tmcdonald
rseiwert wrote:If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
I would like to reproduce this, but I am curious as to the significance of the "-d -d -d" flags. Is that just for generating long output, or does it generate special characters that might cause NDO to hang? If it's just for long output I've already tested up to 100,000 characters of output with no hangup.
BanditBBS wrote:My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
For the sake of your sanity over the weekend, would the event handler "restart services when the queue fills" band-aid work for you? This NDO bug has been nearly impossible to replicate on our end, so fixing it has been difficult on the best of days.

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 10:14 am
by BanditBBS
I wish Trevor. I had put in a cronjob to restart ndo every 3 hours last night, but that wasn't even enough and he had to do those 3 restarts of it manually. Plus, we can't monitor that stuff and kick off an event handler, NagiosXI doesn't function when it happens, so it would neither see the queue is filled nor kick off event handler.

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 11:13 am
by jfrickson
Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.

Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.

I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 11:18 am
by BanditBBS
jfrickson wrote:Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.

Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.

I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Umm, I understand the gist of what you are asking...but lets pretend I am an idiot when it comes to C(not hard, as I am). I'll gladly do what you are asking, but need better instructions :oops:

Re: NDO2DB Issue out of the blue

Posted: Fri Aug 21, 2015 11:42 am
by jfrickson
:lol: Ok.

Go to /usr/local/nagios/bin and enter cp ndo2db ndo2db.bak.

Go to the directory where ndutils is, then to the src sub-directory. Enter cp ndo2db.c ndo2db.c.bak. Edit ndo2db.c in your editor of choice. Change lines 82-91 from this:

Code: Select all

/*#define DEBUG_NDO2DB 1*/ /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
/*#define DEBUG_NDO2DB2 1*/
/*#define NDO2DB_DEBUG_MBUF 1*/
/*
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
*/
to this:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1

#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif

(i.e. remove most of the /* and */

Save the file, then do:

Code: Select all

cd ..
make all
make install
Shut down the current ndo2db process then

Code: Select all

cd /usr/local/nagios/bin
ndo2db -c /usr/local/nagios/etc/ndo2db.cfg > ../var/ndo2db.log
Wait until it dies. Then mv ndo2db.bak ndo2db and restart ndo2db. If the ndo2db.log file is huge (it probably will be) you can just send the last hundred lines or so.

Let me know if you have any problems.