Nagios Pre & Post Services Check
Nagios Pre & Post Services Check
If the any facility to execute pre or post services check in Nagios?
We are having lots of issues with services getting executed even the host is down.
Thanks.
We are having lots of issues with services getting executed even the host is down.
Thanks.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
Re: Nagios Pre & Post Services Check
I'm not aware of any pre and post events.
If you're running XI 5 you should be able to edit your /usr/local/nagios/etc/nagios.cfg file and set:
This will stop the services from checking if the host is down.
Will that work for you?
If you're running XI 5 you should be able to edit your /usr/local/nagios/etc/nagios.cfg file and set:
Code: Select all
host_down_disable_service_checks=1Will that work for you?
Re: Nagios Pre & Post Services Check
Thanks. What is the logic behind this?ssax wrote:I'm not aware of any pre and post events.
If you're running XI 5 you should be able to edit your /usr/local/nagios/etc/nagios.cfg file and set:
This will stop the services from checking if the host is down.Code: Select all
host_down_disable_service_checks=1
Will that work for you?
Nagios checks host status before every check or only once before firing all services check?
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Nagios Pre & Post Services Check
It gets a little complicated but here's the basics in a standard nagios server:rajasegar wrote:Thanks. What is the logic behind this?
Nagios checks host status before every check or only once before firing all services check?
Every time a service check is to be executed, it looks at the host object and determines if it is in a host down HARD state.
If the host is HARD down, it is executed and re-scheduled at the next check_interval or retry_interval HOWEVER no service notifications are sent.
When the service is progressing through max_check_attempts, the retry_interval is used.
Once max_check_attempts has been reached, the service is in a DOWN hard state and the check_interval is used.
When host_down_disable_service_checks=1 is implemented, some of this is a little different.
Assume that the host object has gone through the max_check_attempts in it's definition, it is now in a HARD down state.
Assume that while this has been happening, the service has a larger check_interval, so it has not had a check since the host was last UP.
So when the service check determines the host is down HARD, it is re-scheduled at the next check_interval of the service object, because the last state of the service object was OK.
The service check is NOT executed (the purpose of host_down_disable_service_checks=1) and hence the service stays in an OK state.
Realistically, while the host object has been going through the max_check_attempts in it's definition to determine if it's down hard, if the service check has small check_intervals, before the host object reaches a down hard state, the service object detects an issue and starts re-scheduling it's next check using the retry_interval directive.
So the service check starts being checked more often until the service check max_check_attempts is reached, while it is doing this, the service object will be in a SOFT state.
Once the host object goes into a HARD down state, the service check is NOT executed (the purpose of host_down_disable_service_checks=1) and will continue to be re-scheduled at the retry_interval as the service is currently in a SOFT state.
However because it is in a SOFT state, it will remain in a Critical/Warning/Unknown until the next successful execution of the service check when the host returns to a HARD UP state.
Because it can sometimes take a while for host objects to go down hard, you will get some services that appear in a SOFT state which will appear in the Tactical Overview. It is very hard to avoid this.
Does this make sense?
Here's a detailed example that explains host and service check intervals:
http://sites.box293.com/nagios/guides/c ... -intervals
Here's a detailed example that explains hard and soft states:
http://sites.box293.com/nagios/guides/c ... oft-states
One key topic in all of this is to make sure your host objects go HARD down BEFORE the services get a chance to. Using the same intervals on your host and service objects will cause unnecessary notifications. I covered some of this in my talk at the Nagios World Conference on Nagios XI Best Practices:
https://www.youtube.com/watch?v=6WlZrG-_sAI
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Pre & Post Services Check
That confused the hell out of me 
If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.
I wrote my own script, need to test it first as I am not sure if it will screw up if the params contain any " or '.
If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.
I wrote my own script, need to test it first as I am not sure if it will screw up if the params contain any " or '.
Code: Select all
#!/bin/bash
# VERSION 1.0
#########################################################################
## Check NRPE with Host Status Availability
#########################################################################
sTYPE=$1
sHostAddress=$2
ARG1=$3; ARG2=$4; ARG3=$5; ARG4=$6; ARG5=$7; ARG6=$8; ARG7=$9; ARG8=${10}
CHECK_NRPE=/usr/local/nagios/libexec/check_nrpe
TIME_STAMP=`date +%Y%m%d_%k%M%S`; TIME_STAMP="${TIME_STAMP// /0}"
OUTPUT1=`/usr/local/nagios/libexec/check_icmp -H $sHostAddress -w 3000.0 -c 5000.0 -n 3`
if [ $? -ne 0 ]; then
echo "UNKNOWN - Check stopped, host $sHostAddress not responding ($TIME_STAMP)"
exit 3
fi
# Check if there is any arguments
ARGUMENTS="$ARG2 $ARG3 $ARG4"
ARGUMENTS_TMP="${ARGUMENTS// /}"
if [ "$ARGUMENTS_TMP" == "" ]; then
PARAMS=""
else
PARAMS=" -a $ARG2 $ARG3 $ARG4 $ARG5 $ARG6 $ARG7 $ARG8"
fi
if [ "$sTYPE" == "WIN" ]; then
COMMAND="$CHECK_NRPE -u -t 60 -H $sHostAddress -c $ARG1 $PARAMS ShowAll=long"
elif [ "$sTYPE" == "OTHERS" ]; then
COMMAND="$CHECK_NRPE -u -t 60 -H $sHostAddress -c $ARG1 $PARAMS"
fi
OUTPUT1=`$COMMAND`
RETURN_CODE=$?
echo $OUTPUT1
exit $RETURN_CODE
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Nagios Pre & Post Services Check
rajasegar wrote:That confused the hell out of me
Technically it is down, but nagios incorporates the max_check_attempts and retry_intervals in an effort to prevent false positives. I mean if it wasn't pingable but 10 seconds later it is OK, do you want to know about that immediately?rajasegar wrote:If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.
If it recovers at the next check interval then there really wasn't a problem you needed to know about right?
Your script will help overcome the issue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Pre & Post Services Check
We have disabled unknown notifications in our instance.Box293 wrote:rajasegar wrote:That confused the hell out of me![]()
Technically it is down, but nagios incorporates the max_check_attempts and retry_intervals in an effort to prevent false positives. I mean if it wasn't pingable but 10 seconds later it is OK, do you want to know about that immediately?rajasegar wrote:If the host is not pingable (SOFT or HARD), it is technically down, the services check should be deferred with unknown status.
The service should not be in OK status as we have missed the check interval and unsure of its status.
If it recovers at the next check interval then there really wasn't a problem you needed to know about right?
Your script will help overcome the issue.
We will further enhance the email/sms script to ignore any alerts caused by status change from UNKNOWN to OK.
So this will settle the invalid alerts issue.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
Re: Nagios Pre & Post Services Check
Let us know how it goes. We will keep this thread open for a while.We have disabled unknown notifications in our instance.
We will further enhance the email/sms script to ignore any alerts caused by status change from UNKNOWN to OK.
So this will settle the invalid alerts issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios Pre & Post Services Check
Problem resolved. Please close this case.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation
Re: Nagios Pre & Post Services Check
Glad to see this fixed. i will now close this thread out, but feel free to open a new one if you ever need assistance in the future.
Former Nagios Employee