Warning and Critical have different retry intervals?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Warning and Critical have different retry intervals?

Post by spcmidrange »

Morning

Ill explain what I'm trying to accomplish and hopefully some bright person will come up with an idea :) Heres whats going on:

We have this disk space check for an archive directory for Oracle. When it hits the warning threshold, the event handler kicks in and kicks off a backup and when the backup finishes, it cleans up the partition. Sometimes the backup takes a bit longer than the normal retry intervals, so it was changed to 1 retry every 15 minutes for a total of 10 retries (150 Minutes). This was to allow the backup time to finish, and also if it didnt start, the event handler runs every 15 min to make sure its started. This works great as we would not get notified unless it reached HARD at the end of 10 tries.

Now, we want to impliment a critical threshold, so if it reaches this limit, page out immediately. The problem is that it will wait 150 minutes before it sends out a critical. How can this be set so to have one retry interval for the warnings, and one for criticals?

Cheers!
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Warning and Critical have different retry intervals?

Post by mguthrie »

I think the easiest way to implement this is to actually build it right into your event handler. Since the event handler will get triggered upon a state change, you can just set a different condition if the critical thresholds are reached. If warning is reached, run the normal event handler, if critical is reached, send the alert.
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Warning and Critical have different retry intervals?

Post by CGraham »

I think his concern is that it will be in Critical for so long before the event handler kicks off.

I think the only way to achieve this is with 2 separate checks. One for Warnings and another for Criticals. It would probably be a good idea if they used the same event handler so they don't step on each other's toes.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Warning and Critical have different retry intervals?

Post by slansing »

You could effectively do this with either of the methods above, depending on how much overhead you want I can not think of another way off the top of my head personally. Let us know if you need help implementing this.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Warning and Critical have different retry intervals?

Post by abrist »

Event handlers can be run at every state change. This includes soft critical states. As long as the logic is right in the event handler script and the proper macros are passed to handler, this is definitely possible with just 1 check and 1 event handler. It may be easier with two though.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: Warning and Critical have different retry intervals?

Post by CGraham »

While having the Event Handler send the notification works, there wouldn't be a record of this notification in Nagios. Additionally, the $CONTACT$ macros aren't available for service checks so you'd have to manage the contacts manually in the script.

Just something to consider.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Warning and Critical have different retry intervals?

Post by scottwilkerson »

You could have your event handler send a command via the external command pipe

SEND_CUSTOM_HOST_NOTIFICATION
SEND_CUSTOM_SVC_NOTIFICATION

http://old.nagios.org/developerinfo/ext ... ndlist.php

http://old.nagios.org/developerinfo/ext ... and_id=135
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Warning and Critical have different retry intervals?

Post by spcmidrange »

Thanks for the replys and ideas!

I just implemented 2 seperate checks. One for the warning/eventhandler and it never pages out, and one for the critical threshold/no eventhandler that will work with the default values to notify us if the disk fills up this far.

Cheers!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Warning and Critical have different retry intervals?

Post by slansing »

Excellent, thanks for posting your solution, were you able to test and verify it worked correctly?
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Warning and Critical have different retry intervals?

Post by spcmidrange »

Yup! its all good :)

Cheers!
Locked