Website check hard failure instead of soft on first attempt

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Website check hard failure instead of soft on first attempt

Post by hbouma »

We have a website check setup for a known problimatic website. Usually, it requires the full number of failures before going to a hard state, but today, it went to a hard failure after the first failure.

We are running Nagios XI 5.6.10 on RHEL 7 VM's.
Issue.png
You do not have the required permissions to view the files attached to this post.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Website check hard failure instead of soft on first atte

Post by Box293 »

Are you able to open /usr/local/nagios/var/objects.cache and locate the service definition and post it's config here.

Was there anything recorded in /usr/local/nagios/var/nagios.log for this check?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Website check hard failure instead of soft on first atte

Post by hbouma »

Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.

We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.

Please feel free to lock this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Website check hard failure instead of soft on first atte

Post by scottwilkerson »

hbouma wrote:Ok, I see what happened now. In the /usr/local/nagios/var/nagios.log logs, I see that both the Host Check and Service Check were running at the same time. Because of a known issue with the server overloading the CPU, the system failed a HOST check while running the service check. This caused the service check to immediately go to a hard state.

We do already have the setting in place to stop service checks if the host check is down, but in this case, both were already running.

Please feel free to lock this post.
Great!

Locking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked