I've been a long time Nagios user in a mostly Linux environment.
I'm new to using this forum.... pointers to a prior discussion of this matter would be appreciated.
In multiple recent cases, I'm finding issues relating to the use of multiple "check_nrpe" tests to the same host.
Two of these recent cases, I'm seeing "Socket timeout" even after expanding the timeout value (from 10s to 30s).
Frequently this will cause batches of error e-mails related to check_nrpe failures.
I am just wondering if there might be a some kind of per host "back-off" created if a check_nrpe function fails (or times out).
Any ideas would be appreciated.
Mike
Consolidating multiple check_nrpe tests for same host
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Consolidating multiple check_nrpe tests for same host
There isn't a backoff, however there is host/service dependencies and of the host connection is going down then the services shouldn't notify.
Also, it is common to setup services with a configuration like this
which checks every 5 minutes, if there is a failure, it would switch to check every 1 minute, 5 times before sending out an alert.
This will require 5 failures in a row before the notification goes out.
Also, it is common to setup services with a configuration like this
Code: Select all
max_check_attempts 5
check_interval 5
retry_interval 1
This will require 5 failures in a row before the notification goes out.