decrease check_ping attempts - faster email notification

agentguerry · Post by **agentguerry** » Tue Feb 24, 2015 4:46 pm

I've been running nagios core and have it set up properly to the standards, but I am looking for a way to get faster email notifications when check_ping determines that a server is in critical state.

If I halt a server to test check_ping/email notifications, it generally takes about 10 minutes for me to receive an email stating its down.
Bringing the server back online though, I get an "up" email within a minute.

How can I edit out check_ping and lessen the attempts to get a faster response email that a server is down?

Thanks!

jdalrymple · Post by **jdalrymple** » Tue Feb 24, 2015 4:57 pm

There are 2 object definition components that affect this, check_interval and first_notification_delay.

check_interval defines approximately how often the host is checked, in minutes.
first_notification_delay defines approximately how long before a notification is sent out. To avoid delay set this to 0.

Optionally there are 2 other components that could matter, retry_interval and max_check_attempts.

retry_interval defines approximately how often the check is retried in minutes after the state first changes from OK to any other state.
max_check_attempts defines how many times the check can fail before a notification is sent out. This is a required directive. To achieve the quickest possible alert set this value to 1, if you do the retry_interval specified above becomes irrelevant.

http://nagios.sourceforge.net/docs/3_0/ ... .html#host

agentguerry · Post by **agentguerry** » Wed Feb 25, 2015 9:31 am

Am I correct in thinking that these entries would go into my hosts.cfg file?
OLD:
define host{
use windows-server
host_name myserver.com
alias mysever.com
}
define service{
use generic-service
host_name myserver.com
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

NEW:
define host{
use windows-server
host_name myserver.com
alias mysever.com
}
define service{
use generic-service
host_name myserver.com
service_description PING
check_command check_ping!100.0,20%!500.0,60%
first_notification_delay 0
check_interval 1
}

jdalrymple · Post by **jdalrymple** » Wed Feb 25, 2015 10:16 am

You actually don't even need to add a service. Each host has an implied service to check if it's alive.

http://nagios.sourceforge.net/docs/3_0/hostchecks.html

If your host's check_command is functioning properly then you have no need to add the PING service. Services are defined for things beyond host reachability such as sshd, httpd, ntpd, etc...

In either case, if you want the aggressive monitoring on the host, put it in the host definition, if you want it in the service put it in the service definition, if you want it in both, put it in both.

Make sense?

Also, max_check_attempts is key. If your max_check_attempts is set to 60 you're still not going to get your first alert until an hour after the host goes down.

agentguerry · Post by **agentguerry** » Wed Feb 25, 2015 12:49 pm

ok cool.

my entry is like this now and i'm getting alerts quicker.

define host{
use windows-server
host_name myhost.com
alias hyhost.com
check_command check-host-alive
max_check_attempts 1
notification_interval 30
notification_period 24x7
notification_options d,r
}

can max_check_attempts be set to "0", or is one the lowest Nagios can understand?

I did notice though when I am using the check service ping, along with "check-host-alive" email notifications are once again slow.
Are there variables that can be changed or added to this server that speeds it up, or can the check_ping!100.0,20%!500.0,60% be edited for faster checking?

Thanks for the help. It's definitely helping.

define service{
use generic-service
host_name myhost.com
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

jdalrymple · Post by **jdalrymple** » Wed Feb 25, 2015 1:05 pm

Take a look at

Code: Select all

command_check_interval

in your nagios.cfg

Ignoring the problem you're seeing with both checks involved, are host down alerts taking more than 1 minute or so at this point? Do you have a specific need to alert faster than 1 minute (at the latest) after an outage? If so, that is going to be your best bet. Be aware though that this can become wildly resource intensive and adjusting that interval will affect all of your already defined hosts and services in the same way.

agentguerry · Post by **agentguerry** » Wed Feb 25, 2015 3:45 pm

yeah, a host down is taking around 8 minutes it seems. timed it a couple of rounds.
the recovery email is sent about a minute after the server is back up.

jdalrymple · Post by **jdalrymple** » Wed Feb 25, 2015 4:23 pm

Have you looked at the Alert History (under Reports on the left side) to correlate the time the service goes into a hard failed state with when the alert goes out? Mine works as expected with the settings configured as I explained:

Code: Select all

        max_check_attempts      1
        first_notification_delay        0
        check_interval          1

Nagios Support Forum

decrease check_ping attempts - faster email notification

decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification

Re: decrease check_ping attempts - faster email notification