Can I have a copy of that check, I'm keen to get to the bottom of this, at least work out how to improve NDO debugging.rseiwert wrote:I created a check with a large text section to replicate or as someone found out just run checkwmiplus in debug mode will replicate the issue. Debug mode on NDOUtils is also useless in troubleshooting this.
NDO2DB Issue out of the blue
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: NDO2DB Issue out of the blue
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: NDO2DB Issue out of the blue
There goes Troy, hijacking my thread, tsk tsk, lol 
Seriously though...I'm thinking my nagios is stable now with the auto rescheduling off, but my NDO2DB is still not stable....something is making it puke every so often. I believe everything rseiwert said about the debugging not being useful for this...so how am I going to figure out the check causing my issue(if that is whats doing it)
Seriously though...I'm thinking my nagios is stable now with the auto rescheduling off, but my NDO2DB is still not stable....something is making it puke every so often. I believe everything rseiwert said about the debugging not being useful for this...so how am I going to figure out the check causing my issue(if that is whats doing it)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
Grumpy Olde IT Guy
Re: NDO2DB Issue out of the blue
Broke again, so restarted NDO2DB once more.....so now I have this:
We can't function like this much longer, its every 3-5 hours NDO2DB is stopping processing and we have to restart. We're of course losing those queued messages too.
Code: Select all
[root@iss-chi-nag09 ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee070002 1409024 nagios 600 100672512 98313
0x8b070002 1474561 nagios 600 131072000 128000
0xff070002 1507330 nagios 600 0 0
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
I would like to reproduce this, but I am curious as to the significance of the "-d -d -d" flags. Is that just for generating long output, or does it generate special characters that might cause NDO to hang? If it's just for long output I've already tested up to 100,000 characters of output with no hangup.rseiwert wrote:If you have any checkwmiplus checks just add "-d -d -d" without the quotes to the end. That's triple debug mode
For the sake of your sanity over the weekend, would the event handler "restart services when the queue fills" band-aid work for you? This NDO bug has been nearly impossible to replicate on our end, so fixing it has been difficult on the best of days.BanditBBS wrote:My Indian guy had to restart ndo 3 times over night, woke up to 4 queues. I don't want to watch ndo all weekend, are there any options for me at this point?
Former Nagios employee
Re: NDO2DB Issue out of the blue
I wish Trevor. I had put in a cronjob to restart ndo every 3 hours last night, but that wasn't even enough and he had to do those 3 restarts of it manually. Plus, we can't monitor that stuff and kick off an event handler, NagiosXI doesn't function when it happens, so it would neither see the queue is filled nor kick off event handler.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
jfrickson
Re: NDO2DB Issue out of the blue
Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.
Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.
I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.
I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
Re: NDO2DB Issue out of the blue
Umm, I understand the gist of what you are asking...but lets pretend I am an idiot when it comes to C(not hard, as I am). I'll gladly do what you are asking, but need better instructionsjfrickson wrote:Could you enable debugging? Save a copy of the current executable, then in ndo2db.c starting at line 82:
Leave DEBUG_NDO2DB_EXIT_AFTER_CONNECTION commented out.Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */ /*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */ #define DEBUG_NDO2DB2 1 #define NDO2DB_DEBUG_MBUF 1
Start it from the command line and redirect stdout to a file. When it dies, restart the original executable and send us the debug output.
I can't find anything blatantly obvious in the code, but hopefully the debug output will point me in the right direction.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
jfrickson
Re: NDO2DB Issue out of the blue
Go to /usr/local/nagios/bin and enter cp ndo2db ndo2db.bak.
Go to the directory where ndutils is, then to the src sub-directory. Enter cp ndo2db.c ndo2db.c.bak. Edit ndo2db.c in your editor of choice. Change lines 82-91 from this:
Code: Select all
/*#define DEBUG_NDO2DB 1*/ /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
/*#define DEBUG_NDO2DB2 1*/
/*#define NDO2DB_DEBUG_MBUF 1*/
/*
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
*/
Code: Select all
#define DEBUG_NDO2DB 1 /* don't daemonize */
/*#define DEBUG_NDO2DB_EXIT_AFTER_CONNECTION 1*/ /* exit after first client disconnects */
#define DEBUG_NDO2DB2 1
#define NDO2DB_DEBUG_MBUF 1
#ifdef NDO2DB_DEBUG_MBUF
unsigned long mbuf_bytes_allocated = 0;
unsigned long mbuf_data_allocated = 0;
#endif
Save the file, then do:
Code: Select all
cd ..
make all
make install
Code: Select all
cd /usr/local/nagios/bin
ndo2db -c /usr/local/nagios/etc/ndo2db.cfg > ../var/ndo2db.log
Let me know if you have any problems.