Service check timed out after 60.01 seconds

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi Team,

We are getting Service check timed out errors frequently on NCPA agent server.

Please check and let us know how we can resolve the issue.

Alerts: (Service check timed out after 60.01 seconds) on Log Keyword for ucprs4apprd05 ucprs4apprd05 is CRITICAL


Regards,
Venkata Reddy
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service check timed out after 60.01 seconds

Post by benjaminsmith »

Hi Venkata,

Greetings! Thanks for contacting the support team at Nagios.

In most cases, this is the result of a firewall or access issue. Are you seeing the timeout on just one service or multiple services?

Is the timeout intermittent? If so then it's likely caused by network congestion?

Another possibility is the plugin is taking too long to return data, this can be caused by a number of factors (i.e. slow or unresponsive server).

Can you post the check command or share the system profile and let us know the exact name of the service that is timing out.

--Benjamin

### TO Download a System Profile
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Re: Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi,

Please find the below:

In most cases, this is the result of a firewall or access issue. Are you seeing the timeout on just one service or multiple services?-Yes, we are getting multiple services.

Is the timeout intermittent? If so then it's likely caused by network congestion?-Yes timeout intermittent

Can you post the check command or share the system profile and let us know the exact name of the service that is timing out.- Profile has been attached and it is for all the services for ucprnwcsprd02,ucprs4apprd05 servers

Regards,
Venkata REddy

Moderator note: removed attached profile and placed it on local shared drive
Last edited by pbroste on Wed Sep 08, 2021 10:05 am, edited 1 time in total.
Reason: Moderator note: removed attached profile and placed it on local shared drive
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Hello @Mahesh786

Thanks for following up, and after a review of the System Profile we are not seeing anything that is a defined pain point.

A couple things we would want to do to pin this down, let's increase the service check timeout(s) by editing that section of /usr/local/nagios/etc/nagios.cfg from:
  • service_check_timeout=60
  • to
  • service_check_timeout=120
  • Save the file and restart nagios by running:
  • [list]
  • service nagios restart
[/list]

Let's also find out what the command output results look like:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H <ipaddressorhostnameofucprs4apprd05> -t 'UltraTechXi' -P 5693 -M 'memory/virtual/percent' --verbose
Option to add timeout on the command for results:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H <ipaddressorhostnameofucprs4apprd05> -t 'UltraTechXi' -P 5693 -M 'memory/virtual/percent' --verbose --timeout=xxx


Please let me know the results,
Perry
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Re: Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi,

In ncpa.cfg file we are unable to identify the service_check_timeout=60.

Please find the attached ncpa.cfg file and let us know what changes, we need to exactly.

Regards,
Venkata Reddy
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Hello @Mahesh786

Thanks for following up, you are correct that there is a timeout config in the ncpa.cfg, but let's increase the host and service timeout in nagios.cfg.

To increase the service check timeout(s) by editing that section of /usr/local/nagios/etc/nagios.cfg from:
service_check_timeout=60
to
service_check_timeout=120
Save the file and restart nagios by running:
service nagios restart
The option to increase the timeout in the ncpa check by going to /usr/local/ncpa/etc/ncpa.cfg:

Change the line: # plugin_timeout = 60'

To: plugin_timeout = 120

Then restart nagios.service

Thanks,
Perry
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Re: Service check timed out after 60.01 seconds

Post by Mahesh786 »

HI Team,

We have increased the plugin_timeout = 120 in ncpa.cfg file and restarted the nagios services but still timeout alerts are generating.

Please check and suggest if any changes need to be performed.

Regards,
Venkata Reddy
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Hello Venkata,

Thanks for following up, sounds like you are referencing two hosts that are timing out and want to go ahead and get "tailed" details that provide event timeline:

Code: Select all

while :; do find /usr/local/nagios/var/ -name "*.*" -not -path "/usr/local/nagios/var/rw/*" | xargs tail -F | grep -Ei "warn|error|fail|unknown|critical|ucprs4apprd05" >> /tmp/loggingit.txt; sleep 1; done
Please run until a timeout has been logged {ctl-c to breakout} and send over the '/tmp/loggingit.txt' via Private Message [PM].

Thanks,
Perry
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Re: Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi Team,

We unable to execute cmd and getting the error.

Please find the attached template.

Regards,
Venkata Reddy
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Let's go with this one:

Code: Select all

tail -Fn0 /var/log/httpd/* /var/log/apache2/* /usr/local/nagios/var/* /usr/local/nagiosxi/tmp/* /usr/local/nagiosxi/var/* /var/log/syslog /var/log/messages /usr/local/nagios/var/spool/* /usr/local/nagiosxi/var/components/* | grep -Ei "warn|error|fail|unknown|critical|ucprs4apprd05" >> /tmp/results.txt
Thanks,
Perry
Locked