Nagios Pre & Post Services Check

rajasegar · Post by **rajasegar** » Thu Dec 03, 2015 4:17 am

If the any facility to execute pre or post services check in Nagios?

We are having lots of issues with services getting executed even the host is down.

Thanks.

ssax · Post by **ssax** » Thu Dec 03, 2015 5:16 pm

I'm not aware of any pre and post events.

If you're running XI 5 you should be able to edit your /usr/local/nagios/etc/nagios.cfg file and set:

Code: Select all

host_down_disable_service_checks=1

This will stop the services from checking if the host is down.

Will that work for you?

rajasegar · Post by **rajasegar** » Thu Dec 03, 2015 6:46 pm

ssax wrote:I'm not aware of any pre and post events.

If you're running XI 5 you should be able to edit your /usr/local/nagios/etc/nagios.cfg file and set:
Code: Select all
host_down_disable_service_checks=1
This will stop the services from checking if the host is down.

Will that work for you?

Thanks. What is the logic behind this?
Nagios checks host status before every check or only once before firing all services check?

Post by **Box293** » Thu Dec 03, 2015 9:54 pm

rajasegar wrote:Thanks. What is the logic behind this?
Nagios checks host status before every check or only once before firing all services check?

It gets a little complicated but here's the basics in a standard nagios server:

Every time a service check is to be executed, it looks at the host object and determines if it is in a host down HARD state.
If the host is HARD down, it is executed and re-scheduled at the next check_interval or retry_interval HOWEVER no service notifications are sent.
When the service is progressing through max_check_attempts, the retry_interval is used.
Once max_check_attempts has been reached, the service is in a DOWN hard state and the check_interval is used.

When host_down_disable_service_checks=1 is implemented, some of this is a little different.

Assume that the host object has gone through the max_check_attempts in it's definition, it is now in a HARD down state.
Assume that while this has been happening, the service has a larger check_interval, so it has not had a check since the host was last UP.
So when the service check determines the host is down HARD, it is re-scheduled at the next check_interval of the service object, because the last state of the service object was OK.
The service check is NOT executed (the purpose of host_down_disable_service_checks=1) and hence the service stays in an OK state.

Realistically, while the host object has been going through the max_check_attempts in it's definition to determine if it's down hard, if the service check has small check_intervals, before the host object reaches a down hard state, the service object detects an issue and starts re-scheduling it's next check using the retry_interval directive.
So the service check starts being checked more often until the service check max_check_attempts is reached, while it is doing this, the service object will be in a SOFT state.
Once the host object goes into a HARD down state, the service check is NOT executed (the purpose of host_down_disable_service_checks=1) and will continue to be re-scheduled at the retry_interval as the service is currently in a SOFT state.

However because it is in a SOFT state, it will remain in a Critical/Warning/Unknown until the next successful execution of the service check when the host returns to a HARD UP state.

Because it can sometimes take a while for host objects to go down hard, you will get some services that appear in a SOFT state which will appear in the Tactical Overview. It is very hard to avoid this.

Does this make sense?

Here's a detailed example that explains host and service check intervals:
http://sites.box293.com/nagios/guides/c ... -intervals

Here's a detailed example that explains hard and soft states:
http://sites.box293.com/nagios/guides/c ... oft-states

One key topic in all of this is to make sure your host objects go HARD down BEFORE the services get a chance to. Using the same intervals on your host and service objects will cause unnecessary notifications. I covered some of this in my talk at the Nagios World Conference on Nagios XI Best Practices:
https://www.youtube.com/watch?v=6WlZrG-_sAI

rajasegar · Post by **rajasegar** » Thu Dec 03, 2015 10:16 pm

That confused the hell out of me

If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.

I wrote my own script, need to test it first as I am not sure if it will screw up if the params contain any " or '.

Code: Select all

#!/bin/bash

# VERSION 1.0

#########################################################################
## Check NRPE with Host Status Availability
#########################################################################

sTYPE=$1
sHostAddress=$2
ARG1=$3; ARG2=$4; ARG3=$5; ARG4=$6; ARG5=$7; ARG6=$8; ARG7=$9; ARG8=${10}

CHECK_NRPE=/usr/local/nagios/libexec/check_nrpe
TIME_STAMP=`date  +%Y%m%d_%k%M%S`; TIME_STAMP="${TIME_STAMP// /0}"

OUTPUT1=`/usr/local/nagios/libexec/check_icmp -H $sHostAddress -w 3000.0 -c 5000.0 -n 3`
if [ $? -ne 0 ]; then
   echo "UNKNOWN - Check stopped, host $sHostAddress not responding ($TIME_STAMP)"
   exit 3
fi


# Check if there is any arguments
ARGUMENTS="$ARG2 $ARG3 $ARG4"
ARGUMENTS_TMP="${ARGUMENTS// /}"

if [ "$ARGUMENTS_TMP" == "" ]; then
  PARAMS=""
else
  PARAMS=" -a $ARG2 $ARG3 $ARG4 $ARG5 $ARG6 $ARG7 $ARG8"
fi


if [ "$sTYPE" == "WIN" ]; then
   COMMAND="$CHECK_NRPE -u -t 60 -H $sHostAddress -c $ARG1 $PARAMS ShowAll=long"
elif [ "$sTYPE" == "OTHERS" ]; then
   COMMAND="$CHECK_NRPE -u -t 60 -H $sHostAddress -c $ARG1 $PARAMS"
fi

OUTPUT1=`$COMMAND`
RETURN_CODE=$?
echo $OUTPUT1
exit $RETURN_CODE

Post by **Box293** » Thu Dec 03, 2015 10:26 pm

rajasegar wrote:That confused the hell out of me

rajasegar wrote:If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.

Technically it is down, but nagios incorporates the max_check_attempts and retry_intervals in an effort to prevent false positives. I mean if it wasn't pingable but 10 seconds later it is OK, do you want to know about that immediately?
If it recovers at the next check interval then there really wasn't a problem you needed to know about right?

Your script will help overcome the issue.

rajasegar · Post by **rajasegar** » Thu Dec 03, 2015 10:35 pm

Box293 wrote:
rajasegar wrote:That confused the hell out of me

rajasegar wrote:If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.
Technically it is down, but nagios incorporates the max_check_attempts and retry_intervals in an effort to prevent false positives. I mean if it wasn't pingable but 10 seconds later it is OK, do you want to know about that immediately?
If it recovers at the next check interval then there really wasn't a problem you needed to know about right?

Your script will help overcome the issue.

We have disabled unknown notifications in our instance.
We will further enhance the email/sms script to ignore any alerts caused by status change from UNKNOWN to OK.
So this will settle the invalid alerts issue.

Post by **lmiltchev** » Fri Dec 04, 2015 10:41 am

We have disabled unknown notifications in our instance.
We will further enhance the email/sms script to ignore any alerts caused by status change from UNKNOWN to OK.
So this will settle the invalid alerts issue.

Let us know how it goes. We will keep this thread open for a while.

rajasegar · Post by **rajasegar** » Mon Dec 14, 2015 7:41 pm

Problem resolved. Please close this case.

rkennedy · Post by **rkennedy** » Tue Dec 15, 2015 10:14 am

Glad to see this fixed. i will now close this thread out, but feel free to open a new one if you ever need assistance in the future.

Nagios Support Forum

Nagios Pre & Post Services Check

Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check

Re: Nagios Pre & Post Services Check