Page 1 of 2
Getting Service check timed out after 120.01 seconds)
Posted: Fri Oct 25, 2019 7:46 am
by kporta
I have recently performed an upgrade to a stable system and have never had timeout issues before. Here is the history of what has happened and what I have tried:
Initially during the upgrade I noticed that the UI was stuck in "update in progress" so I followed these steps that worked:
echo "UPDATE xi_options SET value = 'yes' WHERE name = 'last_update_acknowledged';" | mysql -t -u root -pnagiosxi nagiosxi
echo "DELETE FROM xi_commands WHERE command = 1120;" | mysql -t -u root -pnagiosxi nagiosxi
Changed the timeout in nagios.cfg to from 60 to 120
Timeout issue troubleshooting that I have done was to run the database repair:
/usr/local/nagiosxi/scripts/repair_databases.sh
Modified reaper setting for a high check volume:
check_result_reaper_frequency=3
max_check_result_reaper_time=10
The above has not helped and I'm still getting the timeout issues. I'm on Nagios XI 5.6.7
Let me know if you need any further information.
Thank you,
Kurt Porta
Blazent Inc
[email protected]
Re: Getting Service check timed out after 120.01 seconds)
Posted: Fri Oct 25, 2019 10:02 am
by tgriep
Which service check is timing out?
Re: Getting Service check timed out after 120.01 seconds)
Posted: Fri Oct 25, 2019 10:28 am
by kporta
Not any specific check, most are the standard check_procs checks like a check apache and I also have check_mem and check oracle that are alerting.
Re: Getting Service check timed out after 120.01 seconds)
Posted: Fri Oct 25, 2019 11:47 am
by tgriep
The Oracle plugin was updated and the settings may need to be updated to point to the path of the Oracle Instant Client you have installed on the server.
Open this file on the server for the Oracle settings.
/usr/local/nagiosxi/etc/configwizards/oracle/oracle
You should see 2 lines like the following. Make sure they are setup to point to valid folders on the server and that should fix the Oracle issues.
Code: Select all
export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client/lib
export ORACLE_HOME=/usr/lib/oracle/11.2/client
I found this in your settings so update the above file to match.
Code: Select all
LD_LIBRARY_PATH=/blzfs1/app/oracle/product/11.2.0/dbhome_1/lib
ORACLE_HOME=/blzfs1/app/oracle/product/11.2.0/dbhome_1
The other plugins looks like you are trying to run on remote servers using the NRPE agent.
The check_nrpe plugin was updated and possibly the command as well so I will need you to run the following commands on the nagios server and post the output to the ticket.
Replace xxx.xxx.xxx.xxx with the IP address of a remote server running the NRPE agent.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -2
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n -2
Re: Getting Service check timed out after 120.01 seconds)
Posted: Mon Oct 28, 2019 7:22 am
by kporta
Here is the output, there definitively seems to be an issue here:
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
Re: Getting Service check timed out after 120.01 seconds)
Posted: Mon Oct 28, 2019 8:15 am
by kporta
I also noticed that the nrpe service was not running locally on the Nagios server, so I started it.
Re: Getting Service check timed out after 120.01 seconds)
Posted: Mon Oct 28, 2019 8:57 am
by kporta
Just to be clear on this, I'm only occasionally getting these service check timeout errors, approximately 5 per day.
Re: Getting Service check timed out after 120.01 seconds)
Posted: Mon Oct 28, 2019 12:55 pm
by kporta
I have done some more testing, putting nrpe into debug mode and re-runing the following command on the Nagios server:
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Here is the output on the client messages log file:
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): host is in allowed host list!
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: Error: (!log_opts) Could not complete SSL handshake with 50.16.202.xxx: 1
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: CONN_CHECK_PEER: checking if host is allowed: 50.16.202.xxx port 56042
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Re: Getting Service check timed out after 120.01 seconds)
Posted: Mon Oct 28, 2019 3:35 pm
by mbellerue
Would it be possible to capture this in a tcpdump on both client and server side?
Re: Getting Service check timed out after 120.01 seconds)
Posted: Tue Oct 29, 2019 8:56 am
by kporta
I have uploaded the tcpdump you requested, running each of the commands separately.
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Thank you,
Kurt