Page 1 of 1

Using check_nt to check the status of IIS

Posted: Fri May 26, 2017 11:26 am
by neworderfac33
Good afternoon,
I use the following command within a service definition to see if the IIS service is up or down:

Code: Select all

check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
However, IIS can be down legitimately because we've turned it off in order to carry out a server sync, after which it gets turned back on again.
Is there any way in which I can add a parameter to this to generate a critical alert if the service has been down for say, more than 30 minutes?
I'm on leave now until Tuesday, so if anyone is good enough to reply and I don't get back until after then, that's why.
Have a good weekend all and thanks in advance
Pete

Re: Using check_nt to check the status of IIS

Posted: Fri May 26, 2017 11:37 am
by dwhitfield
Why not just put it in downtime?

Re: Using check_nt to check the status of IIS

Posted: Tue May 30, 2017 3:38 am
by neworderfac33
Thanks for coming back to me - the server in question can be synced at any time by our developers, so I would never be able to add this to downtime, because it might happen at different times of the day, and potentially more than once a day too.

Cheers

Pete

Re: Using check_nt to check the status of IIS

Posted: Tue May 30, 2017 8:28 am
by mcapra
peterooney wrote: Is there any way in which I can add a parameter to this to generate a critical alert if the service has been down for say, more than 30 minutes?
The key word here is alert. You can't determine what indicates a critical state without modifying any given plugin, but you can adjust when Nagios Core sends out an alert by adjusting the retry_interval and max_check_attempts values.

If you wanted to say "if problem persists for 30 minutes, send an alert" step 1 would be baking that logic into your service definition. Using the following retry_interval and max_check_attempts values:

Code: Select all

retry_interval 1
max_check_attempts 30
This effectively means that once a problem is first detected Nagios Core will check every 1 minute up to a maximum of 30 checks before triggering a HARD state. 1 * 30 = 30. This would give us effectively 30 minutes prior to the first notification/alert being triggered. \

The first step in Nagios that determines whether or not an alert/notification should be sent is whether or not the host/service is in a HARD state. More on those topics here:
https://assets.nagios.com/downloads/nag ... tions.html
https://assets.nagios.com/downloads/nag ... types.html

However, maybe I don't want to check every 1 minute 30 times over. That's a bit heavy on the system in general. Perhaps I'd rather check every 5 minutes, but still want to keep the "don't alert unless this has been a problem for 30 minutes or longer" logic in there. I might do something like this instead:

Code: Select all

retry_interval 5
max_check_attempts 6
With this configuration, once a problem is initially detected, Nagios Core will initiate a check every 5 minutes up to a maximum of 6 checks. 5 * 6 = 30. This, much like the previous example, gives us 30 minutes before an actual notification/alert is triggered.

This all assumes you're not doing things like service/host check stalking, you don't have a custom notification handler, etc.

Re: Using check_nt to check the status of IIS

Posted: Tue May 30, 2017 9:08 am
by dwhitfield
Thanks @mcapra for the detailed response!

I'd like to highlight the portion below as it's kinda the thing left over to consider after the "yay, it works" or "it doesn't work" moment.
mcapra wrote: This all assumes you're not doing things like service/host check stalking, you don't have a custom notification handler, etc.