Search found 63 matches

by MalcolmPreen
Fri Jan 13, 2017 4:03 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

Did you read my recent note ? Whilst thruk is installed, (and can't be uninstalled as people use it), the problem seems to be explicitly related to scheduled downtime (either the start or the end, and typically multiple downtimes, possibly overlapping). . Whilst I could set-up a "non-thruk"...
by MalcolmPreen
Thu Jan 12, 2017 10:02 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

OK... just when we thought we were out of the trees.... (no reproductions since xmas). We had a "day time" one today.... Nothing to do with thruk scheduled downtimes.... BUT - one of my colleagues had been working on a data centre, so approximately 116 hosts and 1100 services were in downt...
by MalcolmPreen
Thu Dec 29, 2016 6:50 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

Thanks ... proceeding down that line.... (contacting thruk direct). Examining their website suggested updating using the ConSol labs repository... Having ironed out a couple of local network routing issues, I've added the "stable" repository, and have upgraded thruk from 1.80 to 2.00 I'll ...
by MalcolmPreen
Wed Dec 28, 2016 11:08 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

OK, the current status.... nagios core is now 4.2.4 Over the xmas holidays we had a pair of failures.... So, as discussed, I'm investigating updating thruk. We are currently running 1.80-3 - and 2.12-3 is available. I've downloaded all of the available rpms; 4963208 Dec 28 13:56 libthruk-2.10-1.rhel...
by MalcolmPreen
Mon Dec 19, 2016 5:17 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

Still planning to investigate thruk.... but just for information... we had a repeat on nagios core 4.2.3 (upgrading to 4.2.4 tomorrow) Well aware that we can schedule downtime without thruk.... which is why the investigation needs to head that direction.... but keeping it in place so I don't forget....
by MalcolmPreen
Fri Dec 09, 2016 12:01 pm
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

Interesting situation last night. Still running nagios core 4.2.2 and a similar failure occurred. But, this time, the 80 cron jobs started OK (10 per minute for 8 minutes) 20 0 * * * cd /usr/share/thruk && /bin/bash -l -c '/usr/bin/thruk -a downtimetask="hst_hostnameinsertedhere"' ...
by MalcolmPreen
Tue Dec 06, 2016 4:14 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

4.2.3 is going through testing... current plan is to install early next week....
by MalcolmPreen
Mon Dec 05, 2016 10:48 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

OK, set-up core file generation, and have tested this by running;

sleep 300 &
kill -ABRT [pid of sleep above]

which creates core.pid

Fingers crossed this will help if we need it.

Malcolm
by MalcolmPreen
Thu Dec 01, 2016 5:08 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

I'd agree.... but.... we had a failure again last night... It was slightly different, in that the "bigger hammer" resulted in the whole system dying - but checking the logs, I see 13 "set downtime" commands - out of the 320 scheduled.... so I suspect it is related. So, I've reduc...
by MalcolmPreen
Tue Nov 29, 2016 4:24 am
Forum: Open Source Nagios Projects
Topic: nagios dies - sometimes
Replies: 44
Views: 17771

Re: nagios dies - sometimes

Still failing to reproduce, despite increasing the "attempts" to 40 a minute for 8 minutes...

I'll leave this in place, and continue to monitor for a month or so...

If it makes it to the New Year, I suspect we can say that the nagios update has fixed it....

Malcolm