About plugin time out state

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

About plugin time out state

Post by Pitone_Maledetto »

Hi all,
I have a check_raid via hpacucli set up on several DB postgres servers with Nagios Core 4.2.1.
All is fine but I get a CRITICAL notification when the plugin takes more than 120 seconds to reply on a couple of servers.
I wanted to avoid the alert all together since then the check is performed with an OK exit status after a second or third try but can't disable c from the notification_options list.
So I went and modified the nagios.cfg file:

# SERVICE CHECK TIMEOUT STATE
# This setting determines the state Nagios will report when a
# service check times out - that is does not respond within
# service_check_timeout seconds. This can be useful if a
# machine is running at too high a load and you do not want
# to consider a failed service check to be critical (the default).
# Valid settings are:
# c - Critical (default)
# u - Unknown
# w - Warning
# o - OK

service_check_timeout_state=u


to not avail since I still get:

servername/Arrays is CRITICAL:
CRITICAL - Plugin timed out

I have resorted to increase the time out level from 120 to 180 but I don't relaly would like to go much higher than that just fo rthe sake of silencing one check.
The plugin itself has not got a switch to change the time out notification from CRITICAL to UNKNOWN, so I just wondering why the main configuration setting does not work to change globally the time out exit status.

Thank you all for the support.
Ciao
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: About plugin time out state

Post by avandemore »

That seemed extremely long for any type of raid check so I looked at this:

http://h20564.www2.hpe.com/hpsc/doc/pub ... -c03696601
https://github.com/glensc/nagios-plugin ... -138866801
Do either of those apply to you?
Previous Nagios employee
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: About plugin time out state

Post by tgriep »

Can you post your full nagios.cfg file so we can check it's configuration for the Unknown issue you are having when the timeout occurs?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: About plugin time out state

Post by Pitone_Maledetto »

Hi both,
Thank you for the reply.

I have read the thread on the check_raid GIT page where it mentions to use cciss_vol_status and I have indeed implemented the cciss check bypassing the check_raid plugin.
This direct approach to run cciss is due to some bugs that I have promptly discussed with the plugin developer and I am waiting for a patch.

Anyhow in my case the cciss check was causing intermittent loss of heartbeat to some of our custom components causing some major incident every 35 minutes so I had to drop it in favour of hpacucli.
it might very well be an issue just to our setup in fact cciss checks did not cause any problem to other servers but a few.

Anyhow besides the plugin used I would like Nagios to exit UNKNOWN when there is a plugin time out.
Attached you will find my nagios.cfg file.

Thank you.
Attachments
nagios.cfg
(45.16 KiB) Downloaded 408 times
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: About plugin time out state

Post by tgriep »

I tested the timeout state on Core version 4.2.2 and it worked like expected.
You may need to look at the nagios.log file when the plugin times out and see if there is some clue on why it is not working for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: About plugin time out state

Post by Pitone_Maledetto »

Just now...

[1478877034] Auto-save of retention data completed successfully.
[1478878259] SERVICE ALERT: servername;Arrays;CRITICAL;SOFT;1;CRITICAL - Plugin timed out
[1478878368] SERVICE ALERT: servername;Arrays;OK;SOFT;2;OK: hpacucli:[Smart Array P410i: Array A(OK)[LUN1:OK]]
Last edited by Pitone_Maledetto on Mon Nov 14, 2016 6:40 am, edited 1 time in total.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: About plugin time out state

Post by tgriep »

That looks to be the normal timeout message that is generated by the plugin itself and it returned a critical state.

When a plugin hits the service check timeout that is set in the nagios.cfg file, is will generate the log entry like below.

Code: Select all

[1478813340] SERVICE ALERT: localhost;test;UNKNOWN;SOFT;1;(Service check timed out after 60.05 seconds)
Mine is set to 60 seconds and you can see the UNKNOWN state and that it was a Service Check Timeout.
Try decreasing the service_check_timeout setting and see if that makes it work for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: About plugin time out state

Post by Pitone_Maledetto »

Hi tgriep,
thank you for the reply.
Why would decreasing the time out setting help?
Would this not make the time out alert trigger more often?

I am testing a check_ilo plugin that should also check the array/LUN statuses, I have altered said plugin to get an UNKOWN exit status at time out instead of a CRITICAL.
If the service_check_timeout_state won't over-ride the plugin exit itself (I thought it would) I don't see another solution other than modify the plugin exit itself.

Is there any other configuration that might coincidentally overide the global time out exit status?

Regards
Last edited by Pitone_Maledetto on Mon Nov 14, 2016 6:42 am, edited 1 time in total.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: About plugin time out state

Post by Pitone_Maledetto »

just now with check_load...

[1479123101] HOST ALERT: servername;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
[1479123105] SERVICE ALERT: servername;Load;CRITICAL;SOFT;1;CRITICAL - Plugin timed out
[1479123133] HOST ALERT: servername;UP;SOFT;2;PING WARNING - Packet loss = 82%, RTA = 108.35 ms
[1479123207] SERVICE ALERT: servername;Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00


should not the Load check default to the global time out exit of UNKNOWN?
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: About plugin time out state

Post by tgriep »

Most plugins have a built in timeout. If the timeout of the plugin is met before the global service timeout, the plugin will return whatever state the plugin is programmed to to.
If a plugin doesn't have a timeout built in, that is where the global timeout is usefull. When the nagios process runs a plugin that will not timeout on it's own, the global timeout setting will stop it from running ans that is where it will use the service_check_timeout_state option.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked