Page 1 of 2
check_nrpe plugin can't connect, Nagios reports service "OK"
Posted: Tue Mar 25, 2014 8:30 pm
by pato
Just upgraded from 3.5.1 to 4.0.4 and noticed the following when testing notifications by disabling a remote NRPE service:
Nagios reports the service as being in "OK" state, when it should be reporting a critical error. The 4 services in OK state are checked by the check_nrpe plugin.
Running the check from the command line produces the following output:
Code: Select all
$ ../../libexec/check_nrpe -H aumelbou-kof -c check_load
connect to address 10.210.8.228 port 5666: Connection refused
connect to host aumelbou-kof port 5666: Connection refused
$ echo $?
255
I don't know why check_nrpe exits with 255 on a connection error, but Nagios 3.5.1 regarded out-of-bounds exit codes as errors. How can I change this new behaviour that regards them as "OK"?
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Wed Mar 26, 2014 12:48 pm
by slansing
You are correct, they should be returned as criticals, can you share the service configuration of one of these checks? And also the command from commands.cfg that is being used by that service?
In addition, what version of the nagios plugins are you running? When you check a host that has port 5666 open, "so that you can actually run the check successfully" what is returned? Do you have multiple hosts that are out of bounds and are returning OK states instead of critical?
Code: Select all
cat /usr/local/nagios/var/status.dat | grep 'out of bounds'
It may also be that the error code is being read in incorrectly, I'm going to do some looking around on that one.
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Wed Mar 26, 2014 5:00 pm
by pato
Probably of most significance is that I've found that this error only occurs on version 2.15 of the check_nrpe plugin. I've rolled back to an earlier version, v2.12, which exits with code 2 on the same connection error, and Nagios interprets this correctly.
Service config for memory check:
Code: Select all
define service{
use local-service,graphed-service
host_name aumelbou-kof
service_description Memory
contact_groups admins
check_command check_win_nrpe!alias_mem
}
Templates referenced:
Code: Select all
define service{
name local-service
use generic-service
max_check_attempts 4
normal_check_interval 2
retry_check_interval 1
register 0
}
Code: Select all
define service {
name graphed-service
action_url /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$' onMouseOver='showGraphPopup(this)' onMouseOut='hideGraphPopup()' rel='/nagios/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&period=week&rrdopts=-w+450+-j
register 0
}
Code: Select all
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
register 0
}
And the command called by the service check:
Code: Select all
define command{
command_name check_win_nrpe
command_line $USER1$/check_nrpe -H $HOSTNAME$ -c $ARG1$
}
When you check a host that has port 5666 open, "so that you can actually run the check successfully" what is returned?
Code: Select all
$ ../../libexec/check_nrpe -H aumelbou-kof -c alias_mem
OK: physical memory: Total: 7.97G - Used: 5.22G (65%) - Free: 2.75G (35%)|'physical memory %'=65%;100;100 'physical memory'=5.21999G;7.96699;7.96699;0;7.96699
$ echo $?
0
Do you have multiple hosts that are out of bounds and are returning OK states instead of critical?
No, the out of bounds error only occurs when the Nagios service cannot reach the remote nrpe service (due to intermittent network issues or when I purposely turn off nrpe).
Thanks very much for your help so far, I'm really grateful that such a helpful community exists.
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Thu Mar 27, 2014 1:23 pm
by lmiltchev
Let's clarify, when you use
NRPE v2.12, and you have "Connection refused" error, running:
produces the correct output:
but if you use
NRPE v2.15, you get:
Right?
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Mon Apr 21, 2014 10:00 pm
by pato
lmiltchev wrote:Let's clarify, when you use
NRPE v2.12, and you have "Connection refused" error, running:
produces the correct output:
but if you use
NRPE v2.15, you get:
Right?
That's correct (and sorry for the late reply on this one)
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Mon Apr 21, 2014 10:16 pm
by pato
Dragging this thread back up to update briefly. I've since found another scenario that results in an out-of-bounds error being interpreted with "OK" status when it really (
really) shouldn't be - when a command file is not executable by nagios user:
Code: Select all
$ whoami
nagios
$ /usr/local/nagios/libexec/check_psqsq20_waiting_tasks.pl
bash: /usr/local/nagios/libexec/check_psqsq20_waiting_tasks.pl: Permission denied
$ echo $?
126
Results in:
This is bad! As you can see this went unnoticed for almost a month due to the incorrect status applied to the return code. I've since put in a check for "out of bounds" in status.dat, but ideally this would be handled correctly by Nagios.
Now on Nagios Core 4.0.5, by the way.
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Tue Apr 22, 2014 2:08 pm
by lmiltchev
We will be testing this and filing a bug report if we are able to recreate the issue. If you wish to post a bug report on our bug tracker on your own, you are welcome to do so. Thanks for the feedback!
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Tue Apr 22, 2014 8:00 pm
by pato
lmiltchev wrote:We will be testing this and filing a bug report if we are able to recreate the issue. If you wish to post a bug report on our bug tracker on your own, you are welcome to do so. Thanks for the feedback!
Thank you. I've submitted a report:
http://tracker.nagios.org/view.php?id=602
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Wed Apr 23, 2014 10:26 am
by lmiltchev
Thanks, pato!
Re: check_nrpe plugin can't connect, Nagios reports service
Posted: Wed Apr 30, 2014 9:34 pm
by pato
Fixes in Nagios Core 4.0.6 resolve this issue
![Smile :)](./images/smilies/icon_e_smile.gif)