The data I'm getting isn't terribly descriptive. Here's the email I get regarding an alert:
The check_nrpe socket timeout would seem to indicate that the Nagios server can't reach the nrpe agent on the monitored client, but the monitoring seems to work just fine (the Service State information page shows active check type, when the last check was done, everything green, etc.).***** Nagios *****
Notification Type: PROBLEM
Service: PROC
Host: <hostname>
Address: <ipaddress>
State: CRITICAL
Date/Time: Wed Dec 29 01:35:29 EST 2010
Additional Info:
CHECK_NRPE: Socket timeout after 10 seconds.
The service definition from services.cfg is thus:
Code: Select all
define service{
host_name <hostname>
service_description PROC
check_command check_nrpe!check_cpu!"-n -w 95 -c 100"
use generic-service
contact_groups Unix Admins
}
When I added the -n argument, the check worked:/etc/nagios3/conf.d# /usr/lib/nagios/plugins/check_nrpe -H <ipaddress> -c check_cpu
CHECK_NRPE: Socket timeout after 10 seconds.
Any tips as to what I may have mis-configured such that the alert is throwing this "socket timeout"?/etc/nagios3/conf.d# /usr/lib/nagios/plugins/check_nrpe -H <ipaddress> -c check_cpu -n
CPU Usage normal: CPU7: 8.08% CPU6: 7.86% CPU5: 8.13% CPU4: 15.16% CPU3: 7.17% CPU2: 7.51% CPU1: 7.40% CPU0: 35.51%
Regards,
Don