NDO2DB Issue out of the blue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

rseiwert wrote:Bandit, I did see this behavior in the unpatched version as well but Nagios XI also choking in the exact same way you were experiencing. When it was borderline XI would work past it's choke state and then I would notice the invalid results.
So the bogus check data is not new with my patch? That will make a big difference in where I look for that.

Does the bogus data ever work itself out, or is it fixed only by restarting ndo2db?
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: NDO2DB Issue out of the blue

Post by rseiwert »

Again, educated guesses here. It is related to oversized results. The bogus data persists until the result size reduces to normal. Right now I'm using brute force to make it crash but with some effort I,m sure we could figure out the breaking point. Where to look, you might already have enhanced the ndo2db debug logging to show but in my mind there is no such thing as to much debug logging.

As soon as I change my test back to listing only the errors in the last hour the results return to normal.
Grumpy Olde IT Guy
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

BanditBBS wrote:Well, this was the first night all week that it hasn't crashed 2-3 time between 10pm and 8:10am. I have high hopes, but not calling this completed/fixed until it goes the weekend with no issues as well....but looking good :)
Are we still looking good? I know @rseiwert and @jfrickson are discussing some possible related issues, but I wanted to check on yours.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: NDO2DB Issue out of the blue

Post by BanditBBS »

tmcdonald wrote:
BanditBBS wrote:Well, this was the first night all week that it hasn't crashed 2-3 time between 10pm and 8:10am. I have high hopes, but not calling this completed/fixed until it goes the weekend with no issues as well....but looking good :)
Are we still looking good? I know @rseiwert and @jfrickson are discussing some possible related issues, but I wanted to check on yours.
Yeah Trevor, I think the issue can be marked closed and maybe this fixes the weird issue so many others sometimes have had with ndo2db. Feel free to keep this open so they can finish the discussion related to the other bug.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

rseiwert wrote:Again, educated guesses here. It is related to oversized results. The bogus data persists until the result size reduces to normal.
The attached patch does away with all realloc()s and calloc()s by limiting output to ~64K, so if there's a memory corruption issue, this might take care of it. Try this when you get a chance and let us know if your bogus data issue is still there or not.
You do not have the required permissions to view the files attached to this post.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: NDO2DB Issue out of the blue

Post by rseiwert »

Code: Select all

--- ndo2db.c.orig	2015-08-31 12:20:39.433892447 -0500
+++ ndo2db.c	2015-08-31 13:02:08.089690711 -0500
Is this a patch to patch or to the original original? The time date stamps have me worried.
Grumpy Olde IT Guy
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

rseiwert wrote:Is this a patch to patch or to the original original? The time date stamps have me worried.
Patch to the original.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: NDO2DB Issue out of the blue

Post by rseiwert »

I can still get critical results to show green by overloading the results. This doesn't matter that much to me. In my case the problem was checkwmiplus which has added a flag to limit the output in version 1.6. It is still my humble opinion that a run away critical event with to much to say should not go green. I do agree that checks should not be returning 1/2 meg worth of results in a perfect world but, if it was a perfect world, we would not need monitoring.
Check WMI Plus Version 1.6
•Added --forcetruncateoutput so you can restrict the maximum length of the plugin output. Does not affect debug mode. Default value set to 8192 bytes.
Grumpy Olde IT Guy
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NDO2DB Issue out of the blue

Post by tmcdonald »

It's going to take some time, but I definitely agree that this is not expected or desirable behavior. It's a side-effect of the testing process, and I can't imagine we will call this bug squashed until the overflow issue is resolved. Like the heads of a hydra, sometimes when fixing one bug you create another. 'tis the nature of the beast.
Former Nagios employee
jfrickson

Re: NDO2DB Issue out of the blue

Post by jfrickson »

rseiwert wrote:I can still get critical results to show green by overloading the results. This doesn't matter that much to me. In my case the problem was checkwmiplus which has added a flag to limit the output in version 1.6. It is still my humble opinion that a run away critical event with to much to say should not go green. I do agree that checks should not be returning 1/2 meg worth of results in a perfect world but, if it was a perfect world, we would not need monitoring.
When you get a chance, apply the attached patch to the original source. It adds some debugging info to the ndo2db.debug file. Then change the ndo2db.cfg file. Set debug_level=-1 and debug_verbosity=2. Maybe bump up the size of max_debug_file_size while you're in there.

When you get critical results to show green, turn off debugging: debug_level=0 and send me the output. Hopefully that will tell me where the problem lies.
You do not have the required permissions to view the files attached to this post.
Locked