Hi,
I am running Nagios core version 4.2.3, I have around 6k checks and 300 hosts.
I was making changes to each of the host files(300 files) and configtesting/reloading Nagios. Each host file change took around five minutes and I reloaded some 10 or 20 times in an hour.
After a while I realized that Nagios has totally stopped running any checks, I confirmed this by doing the following
1) Rescheduled checks, no check ran and there was no reschedule command log in the log file, the log file went totally silent.
2) Ran ps on loop to see the checks being run, none were running.
3) Hashed out a couple of checks and reloaded Nagios, the checks did not get disabled.
Finally, I restarted Nagios and all the changes started to take effect and I could see that Nagios started running checks.
There was no indication on the web interface that Nagios has stopped working, everything appeared normal. I was able to catch the anomaly only because the changes I made were not reflecting.
Any ideas what could have caused this and how can I avoid this in the future ?
Thanks,
Termcap.
Nagios stops running checks after reload
Re: Nagios stops running checks after reload
We have also had issues like this when reloading Nagios 4.3.?.
The only thing I could see, was that when using reload we got some zombie ("defunct") processes, that in some cases "hogged" quite a lot of memory.
Since then we have stopped doing a reload and do a restart instead and everything works ok now.
The only thing I could see, was that when using reload we got some zombie ("defunct") processes, that in some cases "hogged" quite a lot of memory.
Since then we have stopped doing a reload and do a restart instead and everything works ok now.
--
D/\N
D/\N
Re: Nagios stops running checks after reload
Restart works, but its not a very elegant solution, considering the size of your setup, it can take a while for Nagios to restart.danjoh wrote:We have also had issues like this when reloading Nagios 4.3.?.
The only thing I could see, was that when using reload we got some zombie ("defunct") processes, that in some cases "hogged" quite a lot of memory.
Since then we have stopped doing a reload and do a restart instead and everything works ok now.
Re: Nagios stops running checks after reload
Yes, that is true - not very elegant, we would also prefer to have a working reload.
But we also prefer a running monitoring over a stale one
But we also prefer a running monitoring over a stale one
--
D/\N
D/\N
Re: Nagios stops running checks after reload
This is already addresses before: https://github.com/NagiosEnterprises/na ... issues/441
Rob Hassing
Re: Nagios stops running checks after reload
Truedanjoh wrote:Yes, that is true - not very elegant, we would also prefer to have a working reload.
But we also prefer a running monitoring over a stale one
What I have done is that i've written a Nagios check that runs every 5 minutes and touches a file on the Nagios server, parallelly there is a cron job that runs every 3 minutes and checks of the file is older than 10 minutes. (If the file is older than 10 minutes, it would indicate that Nagios has stopped running checks).
In case the file is indeed more than 10 minutes old, the cron script simple stops Apache, this indicates to us that Nagios has stopped functioning as desired. We can afford to reload now.
Re: Nagios stops running checks after reload
Great to hear!
Did you resolve your issue?
Are we okay to close this thread or did you have more questions?
Did you resolve your issue?
Are we okay to close this thread or did you have more questions?
Re: Nagios stops running checks after reload
The issue is not resolved, I just wrote a bash script to monitor Nagios temporarily. Is there an ongoing bug for this problem or should I raise one on github ?kyang wrote:Great to hear!
Did you resolve your issue?
Are we okay to close this thread or did you have more questions?