Page 1 of 3

Command checkveeambu didn't terminate within the timeout per

Posted: Wed Mar 25, 2020 4:28 pm
by kwhogster
Using Nagios Core 4.3.4
I have two Windows servers 2012 R2 and 2016 both run Veeam B&R 10

I have a powershell script to check the backup jobs replication jobs and copy jobs.
The same script is on both servers.

On the 2016 server I am getting the following errors

Host: TGCS024
Service: Win 12 VM Backup
Status: CRITICAL
Last check: 03-25-2020 14:25:37
Duration: 0d 2h 49m 43s
Attempt: 10/10
Status information: CHECK_NRPE: Socket timeout after 120 seconds.

Host: TGCS024
Service: Linux Backup Copy Job
Status: UNKNOWN
Last check: 03-25-2020 14:25:31
Duration: 0d 2h 50m 53s
Attempt: 10/10
Status information: Command checkveeambu didn't terminate within the timeout period 60s


From the Nagios server I run from the command line

root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 120 -c checkveeambu -a 'Linux VM backup' 1
CHECK_NRPE: Socket timeout after 120 seconds.
root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 240 -c checkveeambu -a 'Linux VM backup' 1
Command checkveeambu didn't terminate within the timeout period 60s


From Nagios cfg file

Code: Select all

define service{
        use                     generic-service
        host_name               hostname
        service_description     Linux VM Backup
        check_interval          1440
        notification_interval   1440
        check_command           check_nrpe!checkveeambu! -a 'Linux VM Backup' 1
        servicegroups           Veeam
        }
From my nsclient

Code: Select all

check veeam backups
checkveeambu = cmd /c echo scripts/powershell/check_veeam_backups.ps1 "$ARG1$" "$ARG2$"; exit $LastExitCode | powershell.exe -command -

On the 2012R2 server all the checks work

When I run the script on the server directly it takes awhile to complete. It took 1min 52 seconds
PS C:\program files\nsclient++\scripts\powershell> .\check_veeam_backups.ps1 'linux vm backup' 1
Linux VM Backup Stopped 100% Success


Any thoughts or ideas?

Thank you

Tom
:roll:

Re: Command checkveeambu didn't terminate within the timeout

Posted: Wed Mar 25, 2020 9:27 pm
by Box293
There are two separate things going on here but they both relate to timing.
kwhogster wrote:root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 120 -c checkveeambu -a 'Linux VM backup' 1
CHECK_NRPE: Socket timeout after 120 seconds.
kwhogster wrote:When I run the script on the server directly it takes awhile to complete. It took 1min 52 seconds
Basically a timeout of 120 is not enough, because additional overhead is taken to establish the connection and then start powershell on the remote system. I suspect if you set the timeout to 150 then this command would succeed.
kwhogster wrote:Status information: Command checkveeambu didn't terminate within the timeout period 60s
Nagios itself has a default global timeout of 60m seconds. If you want to wait for a check with a timeout of 150 then the global timeout should be a value greater than 150. Please refer to the following KB article, specifically the section Nagios XI Global Timeout.
https://support.nagios.com/kb/article/n ... s-617.html

Re: Command checkveeambu didn't terminate within the timeout

Posted: Wed Mar 25, 2020 9:39 pm
by kwhogster
Box293

Thanks for the reply.

I thought the -T increase the timeout in one of my examples I had -T 240

Is there another setting? This is CORE not XI

Thanks

Tom

In my Nagios.cfg

Code: Select all

 TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off.  Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands.  All values are in
# seconds.

service_check_timeout=120
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

Re: Command checkveeambu didn't terminate within the timeout

Posted: Thu Mar 26, 2020 7:38 am
by kwhogster
Update
I found NRPE.CFG

made this change

Code: Select all

# COMMAND TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# allow plugins to finish executing before killing them off.

#command_timeout=60
command_timeout=150



# CONNECTION TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# wait for a connection to be established before exiting. This is sometimes
# seen where a network problem stops the SSL being established even though
# all network sessions are connected. This causes the nrpe daemons to
# accumulate, eating system resources. Do not set this too low.

connection_timeout=300


restarted the nrpe service

sudo /etc/init.d/nagios-nrpe-server restart
[ ok ] Restarting nagios-nrpe-server (via systemctl): nagios-nrpe-server.service.

I tried this one first
root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 30 -c checkveeambu -a 'Linux VM backup' 1
CHECK_NRPE: Socket timeout after 30 seconds.

Then this one
/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 120 -c checkveeambu -a 'Linux VM backup' 1
Command checkveeambu didn't terminate within the timeout period 60s

Any ideas?

Re: Command checkveeambu didn't terminate within the timeout

Posted: Thu Mar 26, 2020 3:49 pm
by Box293
kwhogster wrote:Update
I found NRPE.CFG

made this change

Code: Select all

# COMMAND TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# allow plugins to finish executing before killing them off.

#command_timeout=60
command_timeout=150



# CONNECTION TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# wait for a connection to be established before exiting. This is sometimes
# seen where a network problem stops the SSL being established even though
# all network sessions are connected. This causes the nrpe daemons to
# accumulate, eating system resources. Do not set this too low.

connection_timeout=300


restarted the nrpe service

sudo /etc/init.d/nagios-nrpe-server restart
[ ok ] Restarting nagios-nrpe-server (via systemctl): nagios-nrpe-server.service.

I tried this one first
root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 30 -c checkveeambu -a 'Linux VM backup' 1
CHECK_NRPE: Socket timeout after 30 seconds.

Then this one
/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 120 -c checkveeambu -a 'Linux VM backup' 1
Command checkveeambu didn't terminate within the timeout period 60s

Any ideas?
This is the key test we need to resolve.

If it's saying it timed out within 60 seconds then the command_timeout argument on your NRPE client on the remote end is being ignored. Even though you said you restarted the service something is not right. I would restart the entire server completely and test again just to rule out the setting being applied.

Re: Command checkveeambu didn't terminate within the timeout

Posted: Fri Mar 27, 2020 7:41 am
by kwhogster
Troy,

I restarted my ubuntu server that Nagios runs on same issue.
is it possible that I have more than one nrpe.cfg ?

Re: Command checkveeambu didn't terminate within the timeout

Posted: Mon Mar 30, 2020 2:58 pm
by scottwilkerson
There is also the service_check_timeout in the nagios.cfg on the nagios server you could be hitting.

Re: Command checkveeambu didn't terminate within the timeout

Posted: Mon Mar 30, 2020 3:30 pm
by kwhogster
Thanks

My Nagios.cfg

Code: Select all

# TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off.  Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands.  All values are in
# seconds.

#service_check_timeout=120
service_check_timeout=240
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
I restarted the Nagios service after saving the Nagios.cfg file

I doubled the service_check_timeout
still same error

root@tgcs017:/usr/lib/nagios/plugins# ./check_nrpe -u -H TGCS024 -t 120 -c checkveeambu -a 'Linux VM backup' 1
Command checkveeambu didn't terminate within the timeout period 60s


This is very strange.

Tom

Re: Command checkveeambu didn't terminate within the timeout

Posted: Mon Mar 30, 2020 3:37 pm
by scottwilkerson
On the remote system can you show the output of the following

Code: Select all

netstat -nlp|grep 5666
ps -ef|grep nrpe

Re: Command checkveeambu didn't terminate within the timeout

Posted: Mon Mar 30, 2020 3:39 pm
by scottwilkerson
Wait, I just re-read your OP, I didn't realize this was a connection to NSClient++

Can you post your nscp.ini or nsclient.ini