Nagios stops running checks after reload
Posted: Sun Oct 29, 2017 5:11 am
Hi,
I am running Nagios core version 4.2.3, I have around 6k checks and 300 hosts.
I was making changes to each of the host files(300 files) and configtesting/reloading Nagios. Each host file change took around five minutes and I reloaded some 10 or 20 times in an hour.
After a while I realized that Nagios has totally stopped running any checks, I confirmed this by doing the following
1) Rescheduled checks, no check ran and there was no reschedule command log in the log file, the log file went totally silent.
2) Ran ps on loop to see the checks being run, none were running.
3) Hashed out a couple of checks and reloaded Nagios, the checks did not get disabled.
Finally, I restarted Nagios and all the changes started to take effect and I could see that Nagios started running checks.
There was no indication on the web interface that Nagios has stopped working, everything appeared normal. I was able to catch the anomaly only because the changes I made were not reflecting.
Any ideas what could have caused this and how can I avoid this in the future ?
Thanks,
Termcap.
I am running Nagios core version 4.2.3, I have around 6k checks and 300 hosts.
I was making changes to each of the host files(300 files) and configtesting/reloading Nagios. Each host file change took around five minutes and I reloaded some 10 or 20 times in an hour.
After a while I realized that Nagios has totally stopped running any checks, I confirmed this by doing the following
1) Rescheduled checks, no check ran and there was no reschedule command log in the log file, the log file went totally silent.
2) Ran ps on loop to see the checks being run, none were running.
3) Hashed out a couple of checks and reloaded Nagios, the checks did not get disabled.
Finally, I restarted Nagios and all the changes started to take effect and I could see that Nagios started running checks.
There was no indication on the web interface that Nagios has stopped working, everything appeared normal. I was able to catch the anomaly only because the changes I made were not reflecting.
Any ideas what could have caused this and how can I avoid this in the future ?
Thanks,
Termcap.