Page 1 of 1

Website check hard failure instead of soft on first attempt

Posted: Wed Feb 12, 2020 8:34 pm
by hbouma
We have a website check setup for a known problimatic website. Usually, it requires the full number of failures before going to a hard state, but today, it went to a hard failure after the first failure.

We are running Nagios XI 5.6.10 on RHEL 7 VM's.
Issue.png

Re: Website check hard failure instead of soft on first atte

Posted: Wed Feb 12, 2020 8:59 pm
by Box293
Are you able to open /usr/local/nagios/var/objects.cache and locate the service definition and post it's config here.

Was there anything recorded in /usr/local/nagios/var/nagios.log for this check?

Re: Website check hard failure instead of soft on first atte

Posted: Thu Feb 13, 2020 8:01 am
by hbouma
Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.

We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.

Please feel free to lock this post.

Re: Website check hard failure instead of soft on first atte

Posted: Thu Feb 13, 2020 8:20 am
by scottwilkerson
hbouma wrote:Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.

We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.

Please feel free to lock this post.
Great!

Locking