Page 1 of 3

NRPE Socket timeout after 10 seconds

Posted: Sun Feb 12, 2017 3:44 pm
by kwhogster
Nagios 4.1 Core

Every so often I get NRPE Socket timeout after 10 seconds on a lot of my services that use NRPE.

So I looked at my NRPE commands and found no -T value defined.

My current NRPE commands

Code: Select all

define command{
        command_name    check_nrpe
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$ $ARG3$ $ARG4$
}

define command{
        command_name    check_nrpe_test
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$ $ARG3$ $ARG4$ > /tmp/yourlog.txt
}

define command{
        command_name    check_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ $ARG3$
}
define command{
        command_name    check_windows_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_users -a 2 3 "$_HOSTALLOWEDUSERS$"
}
define command{
        command_name    check_ms_win_updates
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_ms_win_updates -a '-wd 15 -cd 30 -M PSWindowsUpdate'
}

AS an example I did this

Code: Select all

define command{
        command_name    check_nrpe
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -T 60 -c $ARG1$ $ARG2$ $ARG3$ $ARG4$
}

define command{
        command_name    check_nrpe_test
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -T 60 -c $ARG1$ $ARG2$ $ARG3$ $ARG4$ > /tmp/yourlog.txt
}

define command{
        command_name    check_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -T 60 -c $ARG1$ -a $ARG2$ $ARG3$
}
define command{
        command_name    check_windows_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -T 60 -c check_users -a 2 3 "$_HOSTALLOWEDUSERS$"
}
define command{
        command_name    check_ms_win_updates
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -T 60 -c check_ms_win_updates -a '-wd 15 -cd 30 -M PSWindowsUpdate'
}
After a restart of Nagios

All the NRPE defined services went to unknown status I had to revert back to the original code.

I looked at the NRPE document and it shows -T after the command -C

Does it matter where I place it?

Thanks
Tom

Re: NRPE Socket timeout after 10 seconds

Posted: Mon Feb 13, 2017 10:50 am
by dwhitfield
If you are trying to set a timeout, use the lowercase t. Please let us know if I am missing what you are trying to do, and if that doesn't work for you.

Re: NRPE Socket timeout after 10 seconds

Posted: Mon Feb 13, 2017 9:40 pm
by kwhogster
dwhitfield

yes trying to set the timeout value

used the lower case t and now it did not fail after the restart.

I made it -t 60 I hoping not to get any more timeouts

Do you have a good suggestion for a time out value?


Update:

Even after restarting Nagios with the -t 60 I am still getting "CHECK_NRPE: Socket timeout after 10 seconds."

shouldn't be after 60 seconds?

Thoughts?

Re: NRPE Socket timeout after 10 seconds

Posted: Tue Feb 14, 2017 10:28 am
by dwhitfield
On the remote host, what's the output of grep command_timeout /usr/local/nagios/etc/nrpe.cfg?

60 is a reasonable timeout. You could probably get away with a smaller #, but since you already have that, you might as well stick with it.

Re: NRPE Socket timeout after 10 seconds

Posted: Tue Feb 14, 2017 7:01 pm
by me@work55
grep does not run on Windows machines

What remote host you mean?

also

Why is it still showing 10 seconds? Is that in the nsclient

Re: NRPE Socket timeout after 10 seconds

Posted: Tue Feb 14, 2017 8:04 pm
by kwhogster
I found the nrpe.cfg file in /etc/Nagios

Code: Select all

# COMMAND TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# allow plugins to finish executing before killing them off.

command_timeout=60



# CONNECTION TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# wait for a connection to be established before exiting. This is sometimes
# seen where a network problem stops the SSL being established even though
# all network sessions are connected. This causes the nrpe daemons to
# accumulate, eating system resources. Do not set this too low.

connection_timeout=300


command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200


So why is it still getting only 10 seconds

Re: NRPE Socket timeout after 10 seconds

Posted: Wed Feb 15, 2017 3:35 pm
by rkennedy
The configuration file looks fine.

Can you attempt to run the command over the CLI and show us the full input / output to verify things?

In my mind, it's always a bit easier here as you can make sure it works BEFORE moving it into a setup. Then you know what's working and what isn't for sure.

Re: NRPE Socket timeout after 10 seconds

Posted: Wed Feb 15, 2017 7:49 pm
by kwhogster
rkennedy

Over the CLI ?????

From the Nagios server??

They are working just it still timeouts with 10 seconds it is as if the commands are not being used

Should I place the -t on the service definition instead?

Example please

Re: NRPE Socket timeout after 10 seconds

Posted: Thu Feb 16, 2017 10:15 am
by rkennedy
kwhogster wrote:rkennedy

Over the CLI ?????

From the Nagios server??

They are working just it still timeouts with 10 seconds it is as if the commands are not being used

Should I place the -t on the service definition instead?

Example please
From the command line of the Nagios server. Please show us.

Re: NRPE Socket timeout after 10 seconds

Posted: Thu Feb 16, 2017 9:09 pm
by kwhogster
rkennedy

It run great from the command line

root@tgcs017:/usr/local/nagios/etc/objects# /usr/lib/nagios/plugins/check_nrpe -H TGCS011 -t 60 -c check_users -a 2 3 administrator
OK: 1 user logged in
Active Sessions: Administrator

Still puzzled.

Even at my job site the same thing happens but there we are on an older Nagios version.