Getting Service check timed out after 120.01 seconds)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Getting Service check timed out after 120.01 seconds)

Post by kporta »

I have recently performed an upgrade to a stable system and have never had timeout issues before. Here is the history of what has happened and what I have tried:

Initially during the upgrade I noticed that the UI was stuck in "update in progress" so I followed these steps that worked:

echo "UPDATE xi_options SET value = 'yes' WHERE name = 'last_update_acknowledged';" | mysql -t -u root -pnagiosxi nagiosxi
echo "DELETE FROM xi_commands WHERE command = 1120;" | mysql -t -u root -pnagiosxi nagiosxi

Changed the timeout in nagios.cfg to from 60 to 120

Timeout issue troubleshooting that I have done was to run the database repair:
/usr/local/nagiosxi/scripts/repair_databases.sh

Modified reaper setting for a high check volume:
check_result_reaper_frequency=3
max_check_result_reaper_time=10

The above has not helped and I'm still getting the timeout issues. I'm on Nagios XI 5.6.7

Let me know if you need any further information.

Thank you,

Kurt Porta
Blazent Inc
[email protected]
Last edited by scottwilkerson on Fri Oct 25, 2019 11:18 am, edited 2 times in total.
Reason: Profile removed and shared with the other Techs
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Getting Service check timed out after 120.01 seconds)

Post by tgriep »

Which service check is timing out?
Be sure to check out our Knowledgebase for helpful articles and solutions!
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

Not any specific check, most are the standard check_procs checks like a check apache and I also have check_mem and check oracle that are alerting.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Getting Service check timed out after 120.01 seconds)

Post by tgriep »

The Oracle plugin was updated and the settings may need to be updated to point to the path of the Oracle Instant Client you have installed on the server.
Open this file on the server for the Oracle settings.
/usr/local/nagiosxi/etc/configwizards/oracle/oracle

You should see 2 lines like the following. Make sure they are setup to point to valid folders on the server and that should fix the Oracle issues.

Code: Select all

export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client/lib
export ORACLE_HOME=/usr/lib/oracle/11.2/client
I found this in your settings so update the above file to match.

Code: Select all

LD_LIBRARY_PATH=/blzfs1/app/oracle/product/11.2.0/dbhome_1/lib
ORACLE_HOME=/blzfs1/app/oracle/product/11.2.0/dbhome_1
The other plugins looks like you are trying to run on remote servers using the NRPE agent.
The check_nrpe plugin was updated and possibly the command as well so I will need you to run the following commands on the nagios server and post the output to the ticket.
Replace xxx.xxx.xxx.xxx with the IP address of a remote server running the NRPE agent.

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -2 
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n -2
Be sure to check out our Knowledgebase for helpful articles and solutions!
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

Here is the output, there definitively seems to be an issue here:

[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
Last edited by kporta on Mon Oct 28, 2019 11:26 am, edited 1 time in total.
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

I also noticed that the nrpe service was not running locally on the Nagios server, so I started it.
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

Just to be clear on this, I'm only occasionally getting these service check timeout errors, approximately 5 per day.
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

I have done some more testing, putting nrpe into debug mode and re-runing the following command on the Nagios server:

/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2

Here is the output on the client messages log file:

Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): host is in allowed host list!
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: Error: (!log_opts) Could not complete SSL handshake with 50.16.202.xxx: 1
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: CONN_CHECK_PEER: checking if host is allowed: 50.16.202.xxx port 56042
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Getting Service check timed out after 120.01 seconds)

Post by mbellerue »

Would it be possible to capture this in a tcpdump on both client and server side?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
kporta
Posts: 16
Joined: Mon Jun 02, 2014 9:22 am

Re: Getting Service check timed out after 120.01 seconds)

Post by kporta »

I have uploaded the tcpdump you requested, running each of the commands separately.

/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2

Thank you,

Kurt
You do not have the required permissions to view the files attached to this post.
Locked