Page 3 of 5

Re: nagios dies - sometimes

Posted: Fri Nov 18, 2016 12:50 pm
by dwhitfield
Did you end up upgrading to 4.2.2? If we end up finding a bug, I just want to make sure it is in the latest version. Thanks!

Re: nagios dies - sometimes

Posted: Mon Nov 21, 2016 3:59 am
by MalcolmPreen
I wanted to prove I could re-produce it, at will, before upgrading.
.
The plan is, at present;
.
- re-produce again tonight
- upgrade tomorrow
- attempt to re-produce tomorrow
- report back
.
Hope that makes sense,
.
Malcolm

Re: nagios dies - sometimes

Posted: Mon Nov 21, 2016 10:19 am
by dwhitfield
Sounds good. Please be aware that Thursday is Thanksgiving in the US, and we will be closing early on Wednesday, and will not be open Thursday or Friday. Of course, the community can still contribute on the forums.

Re: nagios dies - sometimes

Posted: Fri Nov 25, 2016 5:35 am
by MalcolmPreen
Update:

no reproduction on nagios 4.2.1 on Tuesday 22nd
needed to upgrade to 4.2.2 on Tuesday - or it would be next month....
after upgrade... no reproduction as yet...
.
I've increased the attempted reproduction from;
4 lots of 10 attempts (40 total over 4 minutes) to
8 lots of 10 attempts (80 total over 8 minutes) to
8 lots of 20 attempts (160 total over 8 minutes)
.
Currently no reproduction;
.
So... it MIGHT be fixed.... but it also might be a timing issue....
.
I've left it at 8*20 over the weekend.... and will re-check on Monday....

Re: nagios dies - sometimes

Posted: Mon Nov 28, 2016 2:13 pm
by dwhitfield
I think it's at least Monday everywhere in the world. How are things looking?

Re: nagios dies - sometimes

Posted: Tue Nov 29, 2016 4:24 am
by MalcolmPreen
Still failing to reproduce, despite increasing the "attempts" to 40 a minute for 8 minutes...

I'll leave this in place, and continue to monitor for a month or so...

If it makes it to the New Year, I suspect we can say that the nagios update has fixed it....

Malcolm

Re: nagios dies - sometimes

Posted: Tue Nov 29, 2016 10:05 am
by dwhitfield
I know it's technically incomplete news, but it sounds like good news to me. We'll keep the thread open and await news in 2017!

Re: nagios dies - sometimes

Posted: Thu Dec 01, 2016 5:08 am
by MalcolmPreen
I'd agree.... but.... we had a failure again last night...

It was slightly different, in that the "bigger hammer" resulted in the whole system dying - but checking the logs, I see 13 "set downtime" commands - out of the 320 scheduled.... so I suspect it is related.

So, I've reduced the hammer, back to 10 per minute for 8 minutes... and I need to be patient for a "clean" failure (or a non-failure)

Sorry for the less than positive news....

Still action with me...

Re: nagios dies - sometimes

Posted: Thu Dec 01, 2016 10:33 am
by dwhitfield
While we wait for the error, I'm also going to send this thread to our Core dev to see if he thinks there are any logs or traces he wants us to get in case there is a bug here.

UPDATE: John suggested getting a core (lower case c) dump. How to do this depends on a few factors, but if you don't know of to do it, you may find https://stackoverflow.com/questions/179 ... tion-fault useful.

Re: nagios dies - sometimes

Posted: Mon Dec 05, 2016 10:48 am
by MalcolmPreen
OK, set-up core file generation, and have tested this by running;

sleep 300 &
kill -ABRT [pid of sleep above]

which creates core.pid

Fingers crossed this will help if we need it.

Malcolm