Page 1 of 1

Warning and Critical have different retry intervals?

Posted: Mon Feb 11, 2013 6:53 pm
by spcmidrange
Morning

Ill explain what I'm trying to accomplish and hopefully some bright person will come up with an idea :) Heres whats going on:

We have this disk space check for an archive directory for Oracle. When it hits the warning threshold, the event handler kicks in and kicks off a backup and when the backup finishes, it cleans up the partition. Sometimes the backup takes a bit longer than the normal retry intervals, so it was changed to 1 retry every 15 minutes for a total of 10 retries (150 Minutes). This was to allow the backup time to finish, and also if it didnt start, the event handler runs every 15 min to make sure its started. This works great as we would not get notified unless it reached HARD at the end of 10 tries.

Now, we want to impliment a critical threshold, so if it reaches this limit, page out immediately. The problem is that it will wait 150 minutes before it sends out a critical. How can this be set so to have one retry interval for the warnings, and one for criticals?

Cheers!

Re: Warning and Critical have different retry intervals?

Posted: Tue Feb 12, 2013 10:23 am
by mguthrie
I think the easiest way to implement this is to actually build it right into your event handler. Since the event handler will get triggered upon a state change, you can just set a different condition if the critical thresholds are reached. If warning is reached, run the normal event handler, if critical is reached, send the alert.

Re: Warning and Critical have different retry intervals?

Posted: Wed Feb 13, 2013 9:40 am
by CGraham
I think his concern is that it will be in Critical for so long before the event handler kicks off.

I think the only way to achieve this is with 2 separate checks. One for Warnings and another for Criticals. It would probably be a good idea if they used the same event handler so they don't step on each other's toes.

Re: Warning and Critical have different retry intervals?

Posted: Wed Feb 13, 2013 10:40 am
by slansing
You could effectively do this with either of the methods above, depending on how much overhead you want I can not think of another way off the top of my head personally. Let us know if you need help implementing this.

Re: Warning and Critical have different retry intervals?

Posted: Wed Feb 13, 2013 10:43 am
by abrist
Event handlers can be run at every state change. This includes soft critical states. As long as the logic is right in the event handler script and the proper macros are passed to handler, this is definitely possible with just 1 check and 1 event handler. It may be easier with two though.

Re: Warning and Critical have different retry intervals?

Posted: Wed Feb 13, 2013 11:16 am
by CGraham
While having the Event Handler send the notification works, there wouldn't be a record of this notification in Nagios. Additionally, the $CONTACT$ macros aren't available for service checks so you'd have to manage the contacts manually in the script.

Just something to consider.

Re: Warning and Critical have different retry intervals?

Posted: Wed Feb 13, 2013 3:55 pm
by scottwilkerson
You could have your event handler send a command via the external command pipe

SEND_CUSTOM_HOST_NOTIFICATION
SEND_CUSTOM_SVC_NOTIFICATION

http://old.nagios.org/developerinfo/ext ... ndlist.php

http://old.nagios.org/developerinfo/ext ... and_id=135

Re: Warning and Critical have different retry intervals?

Posted: Thu Feb 14, 2013 10:04 am
by spcmidrange
Thanks for the replys and ideas!

I just implemented 2 seperate checks. One for the warning/eventhandler and it never pages out, and one for the critical threshold/no eventhandler that will work with the default values to notify us if the disk fills up this far.

Cheers!

Re: Warning and Critical have different retry intervals?

Posted: Thu Feb 14, 2013 10:39 am
by slansing
Excellent, thanks for posting your solution, were you able to test and verify it worked correctly?

Re: Warning and Critical have different retry intervals?

Posted: Thu Feb 14, 2013 3:30 pm
by spcmidrange
Yup! its all good :)

Cheers!