Page 1 of 1

Consolidating multiple check_nrpe tests for same host

Posted: Thu Jun 21, 2018 2:02 pm
by mhhall3
I've been a long time Nagios user in a mostly Linux environment.
I'm new to using this forum.... pointers to a prior discussion of this matter would be appreciated.
In multiple recent cases, I'm finding issues relating to the use of multiple "check_nrpe" tests to the same host.

Two of these recent cases, I'm seeing "Socket timeout" even after expanding the timeout value (from 10s to 30s).

Frequently this will cause batches of error e-mails related to check_nrpe failures.
I am just wondering if there might be a some kind of per host "back-off" created if a check_nrpe function fails (or times out).

Any ideas would be appreciated.

Mike

Re: Consolidating multiple check_nrpe tests for same host

Posted: Thu Jun 21, 2018 2:39 pm
by scottwilkerson
There isn't a backoff, however there is host/service dependencies and of the host connection is going down then the services shouldn't notify.

Also, it is common to setup services with a configuration like this

Code: Select all

	max_check_attempts		5
	check_interval			5
	retry_interval			1
which checks every 5 minutes, if there is a failure, it would switch to check every 1 minute, 5 times before sending out an alert.

This will require 5 failures in a row before the notification goes out.