Page 1 of 2

Service Check Timed Out for one Service

Posted: Thu Jan 18, 2018 7:50 am
by amitgupta19
Suddenly i have started getting the Alert for one of the service as: "Service Check Timed out" .

This is occurring very frequently.

I would like to know what could be the reason for it?

I have seen the various thread w.r.t this problem and they suggest increasing the time out value. Currently the value is set at 420.
How much should i increase it ?

But is it recommended to increase the timeout value?

Re: Service Check Timed Out for one Service

Posted: Thu Jan 18, 2018 1:05 pm
by mcapra
Many plugins include their own specific flags and methods for handling timeouts, but they do not take precedence over Nagios Core's internal workings when checks are executed.

"Service check timed out after x seconds" means the plugin's execution time exceeded the Nagios configuration's service_check_timeout setting, so Nagios stopped it. This is a global configuration directive that affects all service checks.

You could certainly increase this value, but depending on your check interval this can lead to overlapping checks and generally bad things. Historically, it's been recommended that checks which run for an exceptionally long time be scheduled as cron jobs which submit their results to Nagios Core passively.

Can you share the service definition, its corresponding command definition, and the plugin your command definition is using?

Re: Service Check Timed Out for one Service

Posted: Fri Jan 19, 2018 4:28 pm
by npolovenko
Agreed with @mcapra. 420 seconds already seems like a lot to me. Please send us all the information he requested so that we can identify whats going on.

Re: Service Check Timed Out for one Service

Posted: Mon Jan 22, 2018 4:24 am
by amitgupta19
Please find attached the required Data.

I just want to identify what is suddenly causing this error.

So that i can correct the Problem.

Re: Service Check Timed Out for one Service

Posted: Tue Jan 23, 2018 7:21 am
by amitgupta19
Did i missed any information/Data ?

Can someone please look into it?

Re: Service Check Timed Out for one Service

Posted: Tue Jan 23, 2018 2:23 pm
by mcapra
The -t 420 you've added to your service check is technically correct, but as I said this won't override Nagios Core's service_check_timeout setting. Increasing this setting has implications on each and every one of your checks, so if you do change it, I would suggest you change it with care and diligence.

From the plugin's notes, it admits that the snapin itself takes a while to load:

Code: Select all

# On the check_nrpe command include the -t 30, since it takes some time to load the Exchange cmdlet's.
I'm going to again suggest this be setup as a passive check due to its long runtime:
mcapra wrote: Historically, it's been recommended that checks which run for an exceptionally long time be scheduled as cron jobs which submit their results to Nagios Core passively.

Re: Service Check Timed Out for one Service

Posted: Tue Jan 23, 2018 4:13 pm
by dwhitfield
mcapra wrote:this won't override Nagios Core's service_check_timeout setting.
This is in the nagios.cfg. Can you attach that for review?

Re: Service Check Timed Out for one Service

Posted: Thu Jan 25, 2018 8:59 am
by amitgupta19
In the nagios.cfg service_check_timeout is mentioned as 110 .

But my point is that suddenly what was changed, so it started giving the service check timeout.

Also would like to inform you that now i am not receiving any service check timeout Errors.

Again stating what caused for the sudden error ??

Kindly suggest, how can i identify this?

Re: Service Check Timed Out for one Service

Posted: Thu Jan 25, 2018 10:17 am
by mcapra
Based on my *very limited* reading on the subject, that particular snapin (Microsoft.Exchange.Management.PowerShell.E2010) is know to be slow. There's several different solutions scattered around Google with things you can do to speed it up. I can't vouch for their success.

Simply put, the check_queue_health plugin has room for improvement. Historically, it's been recommended that checks which run for an exceptionally long time be scheduled as cron jobs which submit their results to Nagios Core passively. This would allow you to work around the long run-time without worrying about Nagios Core internals.

Re: Service Check Timed Out for one Service

Posted: Thu Jan 25, 2018 11:17 am
by dwhitfield
amitgupta19 wrote:In the nagios.cfg service_check_timeout is mentioned as 110 .
110 only shows up once on this page (which is really beside the point, thanks for giving the value), but if you want the timeout to be 420, you need to change it in the nagios.cfg. Of course, @mcapra mentioned that this will change for each check.

As far as why this is occurring now, and wasn't before, I would suspect more load on the Windows side considering that is known to be a slow snapin, but if you added more checks on the Nagios side, that could be an issue as well.