Nagios stops running checks after reload

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Nagios stops running checks after reload

Post by termcap »

Hi,

I am running Nagios core version 4.2.3, I have around 6k checks and 300 hosts.

I was making changes to each of the host files(300 files) and configtesting/reloading Nagios. Each host file change took around five minutes and I reloaded some 10 or 20 times in an hour.

After a while I realized that Nagios has totally stopped running any checks, I confirmed this by doing the following

1) Rescheduled checks, no check ran and there was no reschedule command log in the log file, the log file went totally silent.
2) Ran ps on loop to see the checks being run, none were running.
3) Hashed out a couple of checks and reloaded Nagios, the checks did not get disabled.

Finally, I restarted Nagios and all the changes started to take effect and I could see that Nagios started running checks.

There was no indication on the web interface that Nagios has stopped working, everything appeared normal. I was able to catch the anomaly only because the changes I made were not reflecting.

Any ideas what could have caused this and how can I avoid this in the future ?

Thanks,
Termcap.
danjoh
Posts: 73
Joined: Mon Dec 07, 2015 10:43 am
Location: Zürich, Switzerland
Contact:

Re: Nagios stops running checks after reload

Post by danjoh »

We have also had issues like this when reloading Nagios 4.3.?.
The only thing I could see, was that when using reload we got some zombie ("defunct") processes, that in some cases "hogged" quite a lot of memory.
Since then we have stopped doing a reload and do a restart instead and everything works ok now.
--
D/\N
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios stops running checks after reload

Post by termcap »

danjoh wrote:We have also had issues like this when reloading Nagios 4.3.?.
The only thing I could see, was that when using reload we got some zombie ("defunct") processes, that in some cases "hogged" quite a lot of memory.
Since then we have stopped doing a reload and do a restart instead and everything works ok now.
Restart works, but its not a very elegant solution, considering the size of your setup, it can take a while for Nagios to restart.
danjoh
Posts: 73
Joined: Mon Dec 07, 2015 10:43 am
Location: Zürich, Switzerland
Contact:

Re: Nagios stops running checks after reload

Post by danjoh »

Yes, that is true - not very elegant, we would also prefer to have a working reload.
But we also prefer a running monitoring over a stale one ;)
--
D/\N
User avatar
rhassing
Posts: 412
Joined: Sat Oct 05, 2013 10:29 pm
Location: Netherlands

Re: Nagios stops running checks after reload

Post by rhassing »

Rob Hassing
Image
kyang

Re: Nagios stops running checks after reload

Post by kyang »

Thanks @rhassing!

@termcap, did you have any more questions? Or are we okay to close this?
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios stops running checks after reload

Post by termcap »

kyang wrote:Thanks @rhassing!

@termcap, did you have any more questions? Or are we okay to close this?
Hi kyang, I'm a bit confused here, which github thread should I follow ? My issue is not Nagios defunct processes but rather, Nagios not running checks post reload. Are they two related ?
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios stops running checks after reload

Post by termcap »

danjoh wrote:Yes, that is true - not very elegant, we would also prefer to have a working reload.
But we also prefer a running monitoring over a stale one ;)
True :D

What I have done is that i've written a Nagios check that runs every 5 minutes and touches a file on the Nagios server, parallelly there is a cron job that runs every 3 minutes and checks of the file is older than 10 minutes. (If the file is older than 10 minutes, it would indicate that Nagios has stopped running checks).

In case the file is indeed more than 10 minutes old, the cron script simple stops Apache, this indicates to us that Nagios has stopped functioning as desired. We can afford to reload now.
kyang

Re: Nagios stops running checks after reload

Post by kyang »

Great to hear!

Did you resolve your issue?

Are we okay to close this thread or did you have more questions?
termcap
Posts: 27
Joined: Sun Nov 27, 2016 3:09 pm

Re: Nagios stops running checks after reload

Post by termcap »

kyang wrote:Great to hear!

Did you resolve your issue?

Are we okay to close this thread or did you have more questions?
The issue is not resolved, I just wrote a bash script to monitor Nagios :) temporarily. Is there an ongoing bug for this problem or should I raise one on github ?
Locked