Getting Service check timed out after 120.01 seconds)
Getting Service check timed out after 120.01 seconds)
I have recently performed an upgrade to a stable system and have never had timeout issues before. Here is the history of what has happened and what I have tried:
Initially during the upgrade I noticed that the UI was stuck in "update in progress" so I followed these steps that worked:
echo "UPDATE xi_options SET value = 'yes' WHERE name = 'last_update_acknowledged';" | mysql -t -u root -pnagiosxi nagiosxi
echo "DELETE FROM xi_commands WHERE command = 1120;" | mysql -t -u root -pnagiosxi nagiosxi
Changed the timeout in nagios.cfg to from 60 to 120
Timeout issue troubleshooting that I have done was to run the database repair:
/usr/local/nagiosxi/scripts/repair_databases.sh
Modified reaper setting for a high check volume:
check_result_reaper_frequency=3
max_check_result_reaper_time=10
The above has not helped and I'm still getting the timeout issues. I'm on Nagios XI 5.6.7
Let me know if you need any further information.
Thank you,
Kurt Porta
Blazent Inc
[email protected]
Initially during the upgrade I noticed that the UI was stuck in "update in progress" so I followed these steps that worked:
echo "UPDATE xi_options SET value = 'yes' WHERE name = 'last_update_acknowledged';" | mysql -t -u root -pnagiosxi nagiosxi
echo "DELETE FROM xi_commands WHERE command = 1120;" | mysql -t -u root -pnagiosxi nagiosxi
Changed the timeout in nagios.cfg to from 60 to 120
Timeout issue troubleshooting that I have done was to run the database repair:
/usr/local/nagiosxi/scripts/repair_databases.sh
Modified reaper setting for a high check volume:
check_result_reaper_frequency=3
max_check_result_reaper_time=10
The above has not helped and I'm still getting the timeout issues. I'm on Nagios XI 5.6.7
Let me know if you need any further information.
Thank you,
Kurt Porta
Blazent Inc
[email protected]
Last edited by scottwilkerson on Fri Oct 25, 2019 11:18 am, edited 2 times in total.
Reason: Profile removed and shared with the other Techs
Reason: Profile removed and shared with the other Techs
Re: Getting Service check timed out after 120.01 seconds)
Which service check is timing out?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Getting Service check timed out after 120.01 seconds)
Not any specific check, most are the standard check_procs checks like a check apache and I also have check_mem and check oracle that are alerting.
Re: Getting Service check timed out after 120.01 seconds)
The Oracle plugin was updated and the settings may need to be updated to point to the path of the Oracle Instant Client you have installed on the server.
Open this file on the server for the Oracle settings.
/usr/local/nagiosxi/etc/configwizards/oracle/oracle
You should see 2 lines like the following. Make sure they are setup to point to valid folders on the server and that should fix the Oracle issues.
I found this in your settings so update the above file to match.
The other plugins looks like you are trying to run on remote servers using the NRPE agent.
The check_nrpe plugin was updated and possibly the command as well so I will need you to run the following commands on the nagios server and post the output to the ticket.
Replace xxx.xxx.xxx.xxx with the IP address of a remote server running the NRPE agent.
Open this file on the server for the Oracle settings.
/usr/local/nagiosxi/etc/configwizards/oracle/oracle
You should see 2 lines like the following. Make sure they are setup to point to valid folders on the server and that should fix the Oracle issues.
Code: Select all
export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client/lib
export ORACLE_HOME=/usr/lib/oracle/11.2/clientCode: Select all
LD_LIBRARY_PATH=/blzfs1/app/oracle/product/11.2.0/dbhome_1/lib
ORACLE_HOME=/blzfs1/app/oracle/product/11.2.0/dbhome_1The check_nrpe plugin was updated and possibly the command as well so I will need you to run the following commands on the nagios server and post the output to the ticket.
Replace xxx.xxx.xxx.xxx with the IP address of a remote server running the NRPE agent.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -2
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -n -2Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Getting Service check timed out after 120.01 seconds)
Here is the output, there definitively seems to be an issue here:
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
NRPE v3.2.1
[kporta@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
Last edited by kporta on Mon Oct 28, 2019 11:26 am, edited 1 time in total.
Re: Getting Service check timed out after 120.01 seconds)
I also noticed that the nrpe service was not running locally on the Nagios server, so I started it.
Re: Getting Service check timed out after 120.01 seconds)
Just to be clear on this, I'm only occasionally getting these service check timeout errors, approximately 5 per day.
Re: Getting Service check timed out after 120.01 seconds)
I have done some more testing, putting nrpe into debug mode and re-runing the following command on the Nagios server:
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Here is the output on the client messages log file:
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): host is in allowed host list!
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: Error: (!log_opts) Could not complete SSL handshake with 50.16.202.xxx: 1
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: CONN_CHECK_PEER: checking if host is allowed: 50.16.202.xxx port 56042
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Here is the output on the client messages log file:
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: is_an_allowed_host (AF_INET): host is in allowed host list!
Oct 28 13:51:31 dev2-i21399 nrpe[5256]: Error: (!log_opts) Could not complete SSL handshake with 50.16.202.xxx: 1
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: CONN_CHECK_PEER: checking if host is allowed: 50.16.202.xxx port 56042
Oct 28 13:51:39 dev2-i21399 nrpe[5259]: is_an_allowed_host (AF_INET): is host >50.16.202.xxx< an allowed host >50.16.202.xxx<
Re: Getting Service check timed out after 120.01 seconds)
Would it be possible to capture this in a tcpdump on both client and server side?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Getting Service check timed out after 120.01 seconds)
I have uploaded the tcpdump you requested, running each of the commands separately.
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Thank you,
Kurt
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -2
/usr/local/nagios/libexec/check_nrpe -H 52.20.106.xxx -n -2
Thank you,
Kurt
You do not have the required permissions to view the files attached to this post.