NRPE - CHECK_NRPE: Socket Timeout After n Seconds

Problem Description

This KB article addresses the following NRPE error:

CHECK_NRPE: Socket Timeout After n Seconds

 

Assumed Knowledge

The following KB article contains an explanation of how NRPE works and may need to be referenced to completely understand the problem and solution that is provided here:

NRPE - Agent and Plugin Explained

 

Troubleshooting The Error

This is one of the harder errors to pin down errors. More often than not the solution to this problem can be found while following this KB article:

NRPE - CHECK_NRPE: Error - Could Not Complete SSL Handshake

However sometimes it is not related to SSL or your allowed hosts. In these instances, it can either be that a plugin is taking longer than "n" seconds to return the check, or there is a firewall/port issue. 

 

Timeout Issues

You can increase the timeout on the check, though you will have to alter the check in XI and the command and connection timeout in the nrpe.cfg file on the remote host.

 

Nagios XI check_nrpe Timeout

This timeout is how long the check_nrpe command on the Nagios XI server will wait for a response from the NRPE agent. By default the timeout is set to 10 seconds, which is too short for certain checks (disk/filesystem/database checks among others) however in Nagios XI the default has been defined at 30.

In the Nagios XI web interface navigate to Configure > Core Config Manager > Commands. This brings up the Commands page, use the Search field to search for nrpe and click Search.

Click the check_nrpe command.

 


You can change the timeout in Nagios XI with the switch -t in the check_nrpe command.

In the Command Line, change -t xx to a higher value, in the screenshot above you can see it is set to 30 seconds.

Save your changes and then click the Apply Configuration button.

 

NRPE Client Timeout

This timeout is how long the NRPE client on the Nagios XI server will wait for a response from the plugin it executes before returning a result to Nagios XI. You may need to change a couple settings in the remote host's /usr/local/nagios/etc/nrpe.cfgfile depending on how high you set the timeout in Nagios XI. Edit the file with the following command:

vi /usr/local/nagios/etc/nrpe.cfg

When using the vi editor, to make changes press i on the keyboard first to enter insert mode. Press Esc to exit insert mode.

Search for the command_timeout= and connection_timeout= settings which may need to be altered. Set both of these, at minimum, to the value of the timeout in Nagios XI. Usually the connection_timeout=300 is more than enough, as is the command_timeout which defaults to 60 seconds. If you do set your timeout in Nagios XI higher, increase the command_timeout to match.

 

Plugin Timeout

You may also find that certain plugins also have their own timeout argument, if this does exist you would need to define your NRPE command to also take this into account.

 

Nagios XI Global Timeout

Nagios XI by default has a global timeout for host (30 seconds) and service (60 seconds) check commands. This means if you were to change the check_nrpe command timeout in Nagios XI with the switch -t to 120, Nagios XI will not wait for 120 seconds to pass, the global timeout will stop at 60 seconds.

To adjust the global timeout, navigate to Configure > Core Config Manager > CCM Admin > Core Configs. This brings up the Core Configs page and by default the General [nagios.cfg] tab is selected. The two directives to change are:

host_check_timeout=30
service_check_timeout=60

 

Click Save Changes to update these settings and then Apply Config via Quick Tools.

 

A Realistic Discussion On Timeouts

After reading all of that you might think to yourself "I'm going to go and change all the timeouts to 120 seconds". It's not as simple as that, you need to take into account that each layer of timeout needs to take into account the previous layer. If Nagios XI global timeout was set to 120 seconds and the NRPE was command_timeout=120 then it may take a whole second before it gets to NRPE, you will need to take that into account, here's an example of the "layers":

  • Nagios XI Global Timeout

    • 120
  • check_nrpe timeout on Nagios XI server

    • 119
  • connection_timeout= on NRPE Client

    • 118
  • command_timeout= on NRPE Client

    • 117
  • Plugin specific timeout (if any)

    • 116

 

This completes the section on timeouts. The remaining part of this KB article helps identify other reasons why Socket Timeout After n Seconds may be occurring.

 

 

Check the NRPE Service Status:

You may receive this error if the NRPE daemon is not running on the remote host. If you are using xinetd, you can check the status of the service by logging onto the remote host as root and running the following command:

service xinetd status

 

You should see output similar to the following:

xinetd (pid  1260) is running...

 

If you are using the init-script method, or if your distribution does not use the "service" command, you can always grep a process listing:

ps -aef | grep nrpe

 

You should see output similar to the following (important bits in bold):

nagios   53213     1  0 Feb26 ?    00:00:07 /usr/libexec/nrpe -c /etc/nagios/nrpe.cfg --daemon

 

If NRPE/xinetd is not running, start it with the following command:

service xinetd start

 

Or if you are not using xinetd:

/path/to/init/script start

 

The following KB article provides details on the commands that each operating system uses to control NRPE:

NRPE - How To Install NRPE v3 From Source

 

 

Check Firewall and Port Settings:

The last of the probable causes of this error is associated with firewalls and ports. If the NRPE traffic is not traversing a firewall, you will see the checks timeout. Additionally, if port 5666 is not open on the remote host's firewall, you may receive a timeout error as well. Usually xinetd will open the ports automatically, as long as the /etc/xinetd.d/nrpe file is configured correctly, and NRPE's port settings have been added to /etc/services.

First, you should make sure that port 5666 is open on the remote host. The easiest way to do this, is to just run check_nrpefrom the remote host to itself. This will also double as a good way to check that NRPE is functioning as expected. Log into the remote host as root and execute:

/usr/local/nagios/libexec/check_nrpe -H localhost

 

You should get something similar to the following output:

NRPE v2.15

 

If not, make sure the that port 5666 is open on the remote host's firewall. If you are using xinetd go back to previous step (check the NRPE service status) as it should automatically open the port for you.

Checking Remote Host's Ports and Configuring iptables:
You may have to open port 5666 on your firewall, which in the case of most Linux distributions, is iptables. To get a listing of the current iptables rules, run the following on the remote host as root:

iptables -L

 

The expected output is similar to:

ACCEPT - tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:5666 

OR

ACCEPT     tcp  –  anywhere             anywhere            state NEW tcp dpt:nrpe

 

If the port is not open, you will have to add an iptables rule for it using the following commands:

iptables -I INPUT -p tcp --destination-port 5666 -j ACCEPT
service iptables save

 

Those commands were for TCP/IP v4. If you need TCP/IP v6 the commands are similar:

ip6tables -L

 

ip6tables -I INPUT -p tcp --destination-port 5666 -j ACCEPT
service ip6tables save

 

Checking Remote Host's Ports and Configuring firewalld:

Firewalld is present on Enterprise Linux 7 and higher. To get a listing of the current firewalld rules, run the following on the remote host as root:

firewall-cmd --list-all

 

The expected output is similar to:

ports: 5666/tcp

 

If the port is not open, you will have to add a firewalld rule for it using the following commands:

firewall-cmd --zone=public --add-port=5666/tcp
firewall-cmd --zone=public --add-port=5666/tcp --permanent

 

firewalld applies to both TCP/IP v4 and TCP/IP v6.

 

The following KB article provides details on the commands that each operating system uses to open firewall ports:

NRPE - How To Install NRPE v3 From Source

 

 


Checking Port 5666 From the Nagios XI Server with nmap:

You can use nmap (among other port scanners) to check the remote host's ports. If you do not have nmap installed, it can be installed using the following commands (with yum for RHEL/CentOS systems):

yum install -y nmap

 

Once installed, test the connection on port 5666 from the Nagios XI server to the remote host by logging in as root on your Nagios XI server and running the following command:

nmap <remote host ip> -Pn -p 5666

 

Replace your remote host server ip address above. The expected output should be similar to:

PORT     STATE SERVICE
5666/tcp open  nrpe

 

 

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/

Posted by: - Sun, Jul 16, 2017 at 8:29 PM. This article has been viewed 87831 times.
Online URL: https://support.nagios.com/kb/article/nrpe-check_nrpe-socket-timeout-after-n-seconds-617.html

Powered by PHPKB (Knowledge Base Software)