NDO2DB Issue out of the blue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: NDO2DB Issue out of the blue

Post by Box293 »

rseiwert wrote:I created a check with a large text section to replicate or as someone found out just run checkwmiplus in debug mode will replicate the issue. Debug mode on NDOUtils is also useless in troubleshooting this.
Can I have a copy of that check, I'm keen to get to the bottom of this, at least work out how to improve NDO debugging.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

There goes Troy, hijacking my thread, tsk tsk, lol :)

Seriously though...I'm thinking my nagios is stable now with the auto rescheduling off, but my NDO2DB is still not stable....something is making it puke every so often. I believe everything rseiwert said about the debugging not being useful for this...so how am I going to figure out the check causing my issue(if that is whats doing it)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: NDO2DB Issue out of the blue

Post by rseiwert »

If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
Grumpy Olde IT Guy
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

Broke again, so restarted NDO2DB once more.....so now I have this:

Code: Select all

[root@iss-chi-nag09 ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xee070002 1409024    nagios     600        100672512    98313
0x8b070002 1474561    nagios     600        131072000    128000
0xff070002 1507330    nagios     600        0            0
We can't function like this much longer, its every 3-5 hours NDO2DB is stopping processing and we have to restart. We're of course losing those queued messages too.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

rseiwert wrote:If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
I would like to reproduce this, but I am curious as to the significance of the "-d -d -d" flags. Is that just for generating long output, or does it generate special characters that might cause NDO to hang? If it's just for long output I've already tested up to 100,000 characters of output with no hangup.
BanditBBS wrote:My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
For the sake of your sanity over the weekend, would the event handler "restart services when the queue fills" band-aid work for you? This NDO bug has been nearly impossible to replicate on our end, so fixing it has been difficult on the best of days.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

I wish Trevor. I had put in a cronjob to restart ndo every 3 hours last night, but that wasn't even enough and he had to do those 3 restarts of it manually. Plus, we can't monitor that stuff and kick off an event handler, NagiosXI doesn't function when it happens, so it would neither see the queue is filled nor kick off event handler.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.

Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.

I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

jfrickson wrote:Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.

Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.

I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Umm, I understand the gist of what you are asking...but lets pretend I am an idiot when it comes to C(not hard, as I am). I'll gladly do what you are asking, but need better instructions :oops:
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

:lol: Ok.

Go to /usr/local/nagios/bin and enter cp ndo2db ndo2db.bak.

Go to the directory where ndutils is, then to the src sub-directory. Enter cp ndo2db.c ndo2db.c.bak. Edit ndo2db.c in your editor of choice. Change lines 82-91 from this:

Code: Select all

/*#define DEBUG_NDO2DB 1*/ /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
/*#define DEBUG_NDO2DB2 1*/
/*#define NDO2DB_DEBUG_MBUF 1*/
/*
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
*/
to this:

Code: Select all

#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1

#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif

(i.e. remove most of the /* and */

Save the file, then do:

Code: Select all

cd ..
make all
make install
Shut down the current ndo2db process then

Code: Select all

cd /usr/local/nagios/bin
ndo2db -c /usr/local/nagios/etc/ndo2db.cfg > ../var/ndo2db.log
Wait until it dies. Then mv ndo2db.bak ndo2db and restart ndo2db. If the ndo2db.log file is huge (it probably will be) you can just send the last hundred lines or so.

Let me know if you have any problems.
Locked