Another NRPE timeout puzzle

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
rdubya
Posts: 40
Joined: Mon Apr 11, 2016 8:38 am

Another NRPE timeout puzzle

Post by rdubya »

The Nagios installation in question is 4.2.4 core.
The OS it's on is CentOS 6.7, it's nrpe version is nrpe-2.15-7.el6.x86_64
The client server is CentOS 6.6, it's nrpe version is nrpe-2.15-7.el6.x86_64

I've surfed the solutions I could find but I'm still running into a problem with timeouts on an iperf plugin.
SET UP
NAGIOS Server

Code: Select all

cat  /usr/local/nagios/etc/nagios.cfg | grep service_check_timeout=
service_check_timeout=60

Code: Select all

cat /etc/nagios/nrpe.cfg | grep command_timeout=
command_timeout=60
From /usr/local/nagios/etc/objects/commands.cfg

Code: Select all

define command {
   command_name   check_nrpe_iperf
   command_line   $USER1$/check_nrpe_iperf -H $HOSTADDRESS$ -c $ARG1$ -t 60
 }
From /usr/local/nagios/etc/objects/hosts/resource.cfg

Code: Select all

define service{
        use                     generic-service,nagiosgraph
        host_name               resource
        service_description     Network stats
        check_command           check_nrpe!check_nrpe_iperf
NRPE Client
From the nrpe client (and the server that runs the iperf query)'

Code: Select all

/etc/nagios/nrpe.cfg
command[check_nrpe_iperf]=/usr/local/bin/check_iperf3.pl 10.6.117.4 50 60
TESTING:
NAGIOS Server
The command nominally take about 50 seconds to set up and run.

FAIL: The command times out in the GUI.

FAIL: The command times out when run from the CLI.

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H resource -c check_nrpe_iperf
CHECK_NRPE: Socket timeout after 10 seconds.
SUCCESS: The command works from the CLI when the timout is manually applied;

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H resource -c check_nrpe_iperf -t 90
Critical: iperf speed of '10.6.117.4' is 9,01 and [mincrit:50]|Bandwidth=9,01MB
<I'm snipping the rest of the data>
From the NRPE Client

Code: Select all

SUCCESS: [root@resource ~]# check_iperf3.pl 10.6.117.4 1 1
OK: iperf returns 2,29MB |Bandwidth=2,29MB
<I'm snipping this too>
One thread suggested checking out the command from the config, but I'm not seeing anything "telling" with this, other than the time out seems to have been ignored.
Image




So I see the timeout is set to 60 in the nagios and nrpe configs, yet NRPE is complaining after 10 seconds. I have it explicitly set in the command definition as well.
I'm hoping that I've spaced out and forgotten something or have a heinous typo somewhere. Any help is appreciated.


.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Another NRPE timeout puzzle

Post by dwhitfield »

At https://exchange.nagios.org/components/ ... 1&cf_id=29 the author lists the following as the check:

Code: Select all

 define command {
   command_name   check_nrpe_iperf
   command_line   $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$
 }
What happens if you put the timeout before the argument?

Can you post your entire nrpe.cfg from the remote host. Also, what's the output of find / -name nrpe* on the remote host?
rdubya
Posts: 40
Joined: Mon Apr 11, 2016 8:38 am

Re: Another NRPE timeout puzzle

Post by rdubya »

dwhitfield wrote:At https://exchange.nagios.org/components/ ... 1&cf_id=29 the author lists the following as the check:

Code: Select all

 define command {
   command_name   check_nrpe_iperf
   command_line   $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$
 }
Good catch, same behavior.
dwhitfield wrote:What happens if you put the timeout before the argument?
...also same behavior. And yes, service restarts are being done where needed.
dwhitfield wrote:Can you post your entire nrpe.cfg from the remote host.

Code: Select all

#############################################################################
# Sample NRPE Config File
# Written by: Ethan Galstad (nagios@nagios.org) # # Last Modified: 11-23-2007 # # NOTES:
# This is a sample configuration file for the NRPE daemon.  It needs to be # located on the remote host that is running the NRPE daemon, not the host # from which the check_nrpe client is being executed.
#############################################################################

# LOG FACILITY
# The syslog facility that should be used for logging purposes.

log_facility=daemon

# PID FILE
# The name of the file in which the NRPE daemon should write it's process ID # number.  The file is only written if the NRPE daemon is started by the root # user and is running in standalone mode.

pid_file=/var/run/nrpe/nrpe.pid

# PORT NUMBER
# Port number we should wait for connections on.
# NOTE: This must be a non-priviledged port (i.e. > 1024).
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

server_port=5666

# SERVER ADDRESS
# Address that nrpe should bind to in case there are more than one interface # and you do not want nrpe to bind on all interfaces.
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

#server_address=127.0.0.1

# NRPE USER
# This determines the effective user that the NRPE daemon should run as.  
# You can either supply a username or a UID.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

nrpe_user=nrpe

# NRPE GROUP
# This determines the effective group that the NRPE daemon should run as.  
# You can either supply a group name or a GID.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

nrpe_group=nrpe

# ALLOWED HOST ADDRESSES
# This is an optional comma-delimited list of IP address or hostnames # that are allowed to talk to the NRPE daemon. Network addresses with a bit mask # (i.e. 192.168.1.0/24) are also supported. Hostname wildcards are not currently # supported.
#
# Note: The daemon only does rudimentary checking of the client's IP # address.  I would highly recommend adding entries in your /etc/hosts.allow # file to allow only the specified host to connect to the port # you are running this daemon on.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd
allowed_hosts=10.177.176.106,10.100.8.106,10.177.176.2

# COMMAND ARGUMENT PROCESSING
# This option determines whether or not the NRPE daemon will allow clients # to specify arguments to commands that are executed.  This option only works # if the daemon was configured with the --enable-command-args configure script # option.  
#
# *** ENABLING THIS OPTION IS A SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable.
#
# Values: 0=do not allow arguments, 1=allow command arguments

dont_blame_nrpe=0

# BASH COMMAND SUBTITUTION
# This option determines whether or not the NRPE daemon will allow clients # to specify arguments that contain bash command substitutions of the form # $(...).  This option only works if the daemon was configured with both # the --enable-command-args and --enable-bash-command-substitution configure # script options.
#
# *** ENABLING THIS OPTION IS A HIGH SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable.
#
# Values: 0=do not allow bash command substitutions, 
#         1=allow bash command substitutions

allow_bash_command_substitution=0

# COMMAND PREFIX
# This option allows you to prefix all commands with a user-defined string.
# A space is automatically added between the specified prefix string and the # command line from the command definition.
#
# *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: 
# Execute restricted commmands using sudo.  For this to work, you need to add # the nagios user to your /etc/sudoers.  An example entry for alllowing # execution of the plugins from might be:
#
# nagios          ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/
#
# This lets the nagios user run all commands in that directory (and only them) # without asking for a password.  If you do this, make sure you don't give # random users write access to that directory or its contents!

# command_prefix=/usr/bin/sudo 

# DEBUGGING OPTION
# This option determines whether or not debugging messages are logged to the # syslog facility.
# Values: 0=debugging off, 1=debugging on

debug=0

# COMMAND TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will # allow plugins to finish executing before killing them off.

command_timeout=60

# CONNECTION TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will # wait for a connection to be established before exiting. This is sometimes # seen where a network problem stops the SSL being established even though # all network sessions are connected. This causes the nrpe daemons to # accumulate, eating system resources. Do not set this too low.

connection_timeout=300

# WEEK RANDOM SEED OPTION
# This directive allows you to use SSL even if your system does not have # a /dev/random or /dev/urandom (on purpose or because the necessary patches # were not applied). The random number generator will be seeded from a file # which is either a file pointed to by the environment valiable $RANDFILE # or $HOME/.rnd. If neither exists, the pseudo random number generator will # be initialized and a warning will be issued.
# Values: 0=only seed from /dev/[u]random, 1=also seed from weak randomness

#allow_weak_random_seed=1

# INCLUDE CONFIG FILE
# This directive allows you to include definitions from an external config file.

#include=<somefile.cfg>

# COMMAND DEFINITIONS
# Command definitions that this daemon will run.  Definitions # are in the following format:
#
# command[<command_name>]=<command_line>
#
# When the daemon receives a request to return the results of <command_name> # it will execute the command specified by the <command_line> argument.
#
# Unlike Nagios, the command line cannot contain macros - it must be # typed exactly as it should be executed.
#
# Note: Any plugins that are used in the command lines must reside # on the machine that this daemon is running on!  The examples below # assume that you have plugins installed in a /usr/local/nagios/libexec # directory.  Also note that you will have to modify the definitions below # to match the argument format the plugins expect.  Remember, these are # examples only!

# The following examples use hardcoded command arguments...

command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_root]=/usr/lib64/nagios/plugins/check_disk -w 10% -c 5% -p /dev/mapper/vg_cent6goldenimagebob1-lv_root
command[check_data]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/vg_resourcedatadisks-lv_datadisk1
command[check_hostedartifacts]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /var/www/html/hosted command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 600 -c 800 command[check_nrpe_iperf]=/usr/local/bin/check_iperf3.pl 10.6.117.4 50 60


# The following examples allow user-supplied arguments and can
# only be used if the NRPE daemon was compiled with support for 
# command arguments *AND* the dont_blame_nrpe directive in this
# config file is set to '1'.  This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.

#command[check_users]=/usr/lib64/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
#command[check_load]=/usr/lib64/nagios/plugins/check_load -w $ARG1$ -c $ARG2$
#command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
#command[check_procs]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

# INCLUDE CONFIG DIRECTORY
# This directive allows you to include definitions from config files (with a
# .cfg extension) in one or more directories (with recursion).

include_dir=/etc/nrpe.d/
dwhitfield wrote:Also, what's the output of find / -name nrpe* on the remote host?

Code: Select all

etc/sysconfig/nrpe
/etc/rc.d/init.d/nrpe
/etc/nagios/nrpe.cfg
/etc/nrpe.d
/var/lock/subsys/nrpe
/var/www/html/nrpetest
/var/run/nrpe
/var/run/nrpe/nrpe.pid
/usr/sbin/nrpe
/usr/share/augeas/lenses/dist/nrpe.aug
/usr/share/doc/nrpe-2.15
Thanks for the additional eyes, it's much appreciated.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Another NRPE timeout puzzle

Post by tgriep »

After increasing the timeout for the check_nrpe command on the Nagios server, did the check's timeout increase from 10 seconds to 60 seconds?

Can you run the iperf3 command on the remote system like below and post how long it took to run?
time check_iperf3.pl 10.6.117.4 1 1
Be sure to check out our Knowledgebase for helpful articles and solutions!
rdubya
Posts: 40
Joined: Mon Apr 11, 2016 8:38 am

Re: Another NRPE timeout puzzle

Post by rdubya »

Yep. It always fails at the 10 second mark. I estimated 60 seconds.
This; check_iperf3.pl 10.6.117.4 1 1
...gets me this;
real 0m36.677s
user 0m0.204s
sys 0m0.514s

As submitted according to the configs effectively gets me the same;
time check_iperf3.pl 10.6.117.4 50 60
real 0m36.702s
user 0m0.220s
sys 0m0.575s
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Another NRPE timeout puzzle

Post by tgriep »

I would try this, edit the generic check_nrpe command on the nagios server and add -t 60 to it.
So that any service check that is using the check_nrpe command, they will all have the timeout increased to 60 seconds.
Be sure to check out our Knowledgebase for helpful articles and solutions!
rdubya
Posts: 40
Joined: Mon Apr 11, 2016 8:38 am

Re: Another NRPE timeout puzzle

Post by rdubya »

Nope, that didn't work no matter where I put the parameter. This 10 second timeout is tenacious.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Another NRPE timeout puzzle

Post by dwhitfield »

rdubya wrote: I'm still running into a problem with timeouts on an iperf plugin.
Let's take a step back. Are other checks having this issue? If now, I don't know if there are other plugins that will do what you want, but have you looked for replacements?

If other checks are having this issue, maybe it would be helpful to work with something like check_http which is easier to test.
rdubya
Posts: 40
Joined: Mon Apr 11, 2016 8:38 am

Re: Another NRPE timeout puzzle

Post by rdubya »

Everything else works fine. This isn't a huge environment but in addition to memory, storage and workload I watch websites, ports, Jenkins jobs, etc, etc and use custom plugins as well. This would be a home run if it worked, it just seems strange that I'm stuck behind an immovable 10 second timeout. I'd just as soon fix it now, since the use case for this is pretty important at the moment. Thanks.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Another NRPE timeout puzzle

Post by dwhitfield »

rdubya wrote: This isn't a huge environment but in addition to memory, storage and workload I watch websites, ports, Jenkins jobs, etc, etc and use custom plugins as well.
I'm not seeing a lot of these on that host. What else are you checking on this host?

It looks to me like you have a lot of NRPE stuff going on on that server. If it were me, I'd blow everything away on that server related to NRPE and start over with https://assets.nagios.com/downloads/nag ... _Agent.pdf

Alternatively, you could create a check that takes longer than 10 seconds and then see if it times out at 10 seconds. I suspect one of those other NRPE files has a timeout in it and that is what is getting used, which is why I suggested just blowing them all away and using our agent.
Locked