We have a website check setup for a known problimatic website. Usually, it requires the full number of failures before going to a hard state, but today, it went to a hard failure after the first failure.
We are running Nagios XI 5.6.10 on RHEL 7 VM's.
Website check hard failure instead of soft on first attempt
Website check hard failure instead of soft on first attempt
You do not have the required permissions to view the files attached to this post.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Website check hard failure instead of soft on first atte
Are you able to open /usr/local/nagios/var/objects.cache and locate the service definition and post it's config here.
Was there anything recorded in /usr/local/nagios/var/nagios.log for this check?
Was there anything recorded in /usr/local/nagios/var/nagios.log for this check?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Website check hard failure instead of soft on first atte
Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.
We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.
Please feel free to lock this post.
We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.
Please feel free to lock this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Website check hard failure instead of soft on first atte
Great!hbouma wrote:Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.
We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.
Please feel free to lock this post.
Locking