Probably of most significance is that I've found that this error only occurs on version 2.15 of the check_nrpe plugin. I've rolled back to an earlier version, v2.12, which exits with code 2 on the same connection error, and Nagios interprets this correctly.
Service config for memory check:
Code: Select all
define service{
use local-service,graphed-service
host_name aumelbou-kof
service_description Memory
contact_groups admins
check_command check_win_nrpe!alias_mem
}
Templates referenced:
Code: Select all
define service{
name local-service
use generic-service
max_check_attempts 4
normal_check_interval 2
retry_check_interval 1
register 0
}
Code: Select all
define service {
name graphed-service
action_url /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$' onMouseOver='showGraphPopup(this)' onMouseOut='hideGraphPopup()' rel='/nagios/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&period=week&rrdopts=-w+450+-j
register 0
}
Code: Select all
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
register 0
}
And the command called by the service check:
Code: Select all
define command{
command_name check_win_nrpe
command_line $USER1$/check_nrpe -H $HOSTNAME$ -c $ARG1$
}
When you check a host that has port 5666 open, "so that you can actually run the check successfully" what is returned?
Code: Select all
$ ../../libexec/check_nrpe -H aumelbou-kof -c alias_mem
OK: physical memory: Total: 7.97G - Used: 5.22G (65%) - Free: 2.75G (35%)|'physical memory %'=65%;100;100 'physical memory'=5.21999G;7.96699;7.96699;0;7.96699
$ echo $?
0
Do you have multiple hosts that are out of bounds and are returning OK states instead of critical?
No, the out of bounds error only occurs when the Nagios service cannot reach the remote nrpe service (due to intermittent network issues or when I purposely turn off nrpe).
Thanks very much for your help so far, I'm really grateful that such a helpful community exists.