Using check_nt to check the status of IIS

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Using check_nt to check the status of IIS

Post by neworderfac33 »

Good afternoon,
I use the following command within a service definition to see if the IIS service is up or down:

Code: Select all

check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
However, IIS can be down legitimately because we've turned it off in order to carry out a server sync, after which it gets turned back on again.
Is there any way in which I can add a parameter to this to generate a critical alert if the service has been down for say, more than 30 minutes?
I'm on leave now until Tuesday, so if anyone is good enough to reply and I don't get back until after then, that's why.
Have a good weekend all and thanks in advance
Pete
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Using check_nt to check the status of IIS

Post by dwhitfield »

Why not just put it in downtime?
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Using check_nt to check the status of IIS

Post by neworderfac33 »

Thanks for coming back to me - the server in question can be synced at any time by our developers, so I would never be able to add this to downtime, because it might happen at different times of the day, and potentially more than once a day too.

Cheers

Pete
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Using check_nt to check the status of IIS

Post by mcapra »

peterooney wrote: Is there any way in which I can add a parameter to this to generate a critical alert if the service has been down for say, more than 30 minutes?
The key word here is alert. You can't determine what indicates a critical state without modifying any given plugin, but you can adjust when Nagios Core sends out an alert by adjusting the retry_interval and max_check_attempts values.

If you wanted to say "if problem persists for 30 minutes, send an alert" step 1 would be baking that logic into your service definition. Using the following retry_interval and max_check_attempts values:

Code: Select all

retry_interval 1
max_check_attempts 30
This effectively means that once a problem is first detected Nagios Core will check every 1 minute up to a maximum of 30 checks before triggering a HARD state. 1 * 30 = 30. This would give us effectively 30 minutes prior to the first notification/alert being triggered. \

The first step in Nagios that determines whether or not an alert/notification should be sent is whether or not the host/service is in a HARD state. More on those topics here:
https://assets.nagios.com/downloads/nag ... tions.html
https://assets.nagios.com/downloads/nag ... types.html

However, maybe I don't want to check every 1 minute 30 times over. That's a bit heavy on the system in general. Perhaps I'd rather check every 5 minutes, but still want to keep the "don't alert unless this has been a problem for 30 minutes or longer" logic in there. I might do something like this instead:

Code: Select all

retry_interval 5
max_check_attempts 6
With this configuration, once a problem is initially detected, Nagios Core will initiate a check every 5 minutes up to a maximum of 6 checks. 5 * 6 = 30. This, much like the previous example, gives us 30 minutes before an actual notification/alert is triggered.

This all assumes you're not doing things like service/host check stalking, you don't have a custom notification handler, etc.
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Using check_nt to check the status of IIS

Post by dwhitfield »

Thanks @mcapra for the detailed response!

I'd like to highlight the portion below as it's kinda the thing left over to consider after the "yay, it works" or "it doesn't work" moment.
mcapra wrote: This all assumes you're not doing things like service/host check stalking, you don't have a custom notification handler, etc.
Locked