NRPE Socket timeout after 10 seconds

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: NRPE Socket timeout after 10 seconds

Postby rkennedy » Fri Feb 17, 2017 1:22 pm

I would verify all of your definitions in place, because it's pretty apparent that adding the -t at the object level isn't being respected. Adding the -t 60 will fix it up, but it'll need to be at the proper place.
rkennedy
 
Posts: 6562
Joined: Mon Oct 05, 2015 11:45 am

Re: NRPE Socket timeout after 10 seconds

Postby kwhogster » Fri Feb 17, 2017 7:52 pm

I only have them in the commands file

Where else should I look
kwhogster
 
Posts: 378
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: NRPE Socket timeout after 10 seconds

Postby tgriep » Mon Feb 20, 2017 12:56 pm

You can add the -t timeout in the check_nrpe command definition in the commands.cfg file to all of the checks that use the check_nrpe command will have it's timeout increased.
But, if you are still having problems, please post how the check_nrpe command is defined in the commands.cfg file as well as the service check and then we can go from there.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4988
Joined: Thu Oct 30, 2014 9:02 am

Re: NRPE Socket timeout after 10 seconds

Postby kwhogster » Mon Feb 20, 2017 9:01 pm

I do have the -t on the commands

Code: Select all
define command{
        command_name    check_nrpe
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$ $ARG2$ $ARG3$ $ARG4$
}

define command{
        command_name    check_nrpe_test
        command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$ $ARG2$ $ARG3$ $ARG4$ > /tmp/yourlog.txt
}

define command{
        command_name    check_mem
        command_line    $USER1$/check_mem.sh
}

define command{
        command_name    check_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c $ARG1$ -a $ARG2$ $ARG3$
}
define command{
        command_name    check_windows_users
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c check_users -a 2 3 "$_HOSTALLOWEDUSERS$"
}
define command{
        command_name    check_ms_win_updates
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c check_ms_win_updates -a '-wd 15 -cd 30 -M PSWindowsUpdate'
}
define command{
        command_name    check_uptime
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c CheckUpTime -a MaxCrit=90d
}

define command{
        command_name    cpu_load
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c CheckCPU -a warn=80 crit=90 time=1m time=5m time=15m
}

define command{
        command_name    mem_check
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c CheckMEM -a warn=80 crit=90 time=1m time=5m time=15m
}

define command{
        command_name    service_check
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -c CheckServiceState -a $ARG1$

}




Sample of my services

Code: Select all
define service {
        host_name                       TGCS008
        service_description             Check Disk Usage J:
        check_command                   check_nrpe!CheckDriveSize! -a ShowAll=long MinWarn=20% MinCrit=10% Drive=J: perf-unit=G
        check_interval                  1
        servicegroups                   DriveSpace
        use                             generic-service
}
define service {
        host_name                       TGCS009
        service_description             Check Disk Usage
        check_command                   check_nrpe!CheckDriveSize! -a ShowAll=long MinWarn=20% MinCrit=10% Drive=C: perf-unit=G
        check_interval                  1
        servicegroups                   DriveSpace
        use                             generic-service
}

define service {
        host_name                       TGCS001
        service_description             Check OS Version
        check_command                   check_nrpe!CheckWMI! -a "Query=Select Version,Caption from win32_OperatingSystem" columnSyntax="%value%" columnSeparator=", " ignore-perf-data
        servicegroups                   OSVersion
        check_interval                  1
        use                             generic-service
}
define service {
        host_name                       TGCS002
        service_description             Check OS Version
        check_command                   check_nrpe!Check_OS_Version! -a "perf-config=*(ignored:true)"
        servicegroups                   OSVersion
        check_interval                  1
        use                             generic-service
}



Sometimes I see 60 seconds but for the most I see 10 second timeouts

This does happen on all my VM machines not on the Physical Hosts I have running Servers and Computers are fine just the VMs

Running VMWare ESXI Hosts 6.0

Thanks

Tom
kwhogster
 
Posts: 378
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: NRPE Socket timeout after 10 seconds

Postby kwhogster » Mon Feb 20, 2017 9:18 pm

Guys update

I get this too a lot



Check Application Event Logs UNKNOWN 02-20-2017 21:13:52 0d 0h 2m 14s 1/3 CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.


again only on my VMS

Thoughts
kwhogster
 
Posts: 378
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: NRPE Socket timeout after 10 seconds

Postby tgriep » Tue Feb 21, 2017 1:04 pm

The timeout settings for your commands look like they are set correctly to timeout at 60 seconds so I don't know where the 10 second setting is from unless one of the commands was missed.
You may want to check the nagios.cfg file and see if the default service timeout is set higher than 10 seconds.
This is the name of the object you should be looking at service_check_timeout

Take a look at the following document for NRPE Troubleshooting issues.
https://assets.nagios.com/downloads/nagiosxi/docs/NRPE-Troubleshooting-and-Common-Solutions.pdf

The Received 0 bytes from daemon message usually means that the service was not running on the remote server.
For Windows that means the NSClient agent was not running and for linux, the NRPE Agent was not running.
Take a look at the document for more details / causes.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4988
Joined: Thu Oct 30, 2014 9:02 am

Re: NRPE Socket timeout after 10 seconds

Postby kwhogster » Tue Feb 21, 2017 9:13 pm

Thanks for the doc will review later.


One other thought I had since this is only happening on my VM machines after further checking it seem to happen during the backup window.

I am using Veeam B&R to backup y VM's

I was thinking of checking to time of monitoring on these machine to exclude the backup window. Also not all VMs have this issue.


I have tried this on two services but they still alert me

Any suggestions would be helpful


Thanks

Tom
kwhogster
 
Posts: 378
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: NRPE Socket timeout after 10 seconds

Postby tgriep » Wed Feb 22, 2017 10:56 am

Setting up an exclude window would be a good solution for this.
To do this create a time period like the example below if you backup times runs between 1am and 3am, adjust it to your needs.
Code: Select all
define timeperiod {
        timeperiod_name                         backup_time
        alias                                   backup_time
        sunday                                  00:00-01:00,03:00-24:00
        monday                                  00:00-01:00,03:00-24:00
        tuesday                                 00:00-01:00,03:00-24:00
        wednesday                               00:00-01:00,03:00-24:00
        thursday                                00:00-01:00,03:00-24:00
        friday                                  00:00-01:00,03:00-24:00
        saturday                                00:00-01:00,03:00-24:00
        }

Then in your service check, you would define the following
Code: Select all
check_period         backup_time
notification_period      backup_time


That would exclude the check from running and also the notifications between the hours of 1am and 3am.
You could still leave the check_period to 24 x 7 so the service will still run but it will not send notifications during that time.
Either way should work for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4988
Joined: Thu Oct 30, 2014 9:02 am

Re: NRPE Socket timeout after 10 seconds

Postby kwhogster » Wed Feb 22, 2017 8:51 pm

Thanks

I will apply that to all my VMS and will give it a few days

Will report back with results


Tom
kwhogster
 
Posts: 378
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: NRPE Socket timeout after 10 seconds

Postby tgriep » Thu Feb 23, 2017 10:08 am

OK, let us know how it works out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4988
Joined: Thu Oct 30, 2014 9:02 am

PreviousNext

Return to Nagios Core

Who is online

Users browsing this forum: Bing [Bot] and 9 guests