Page 1 of 1

First check_ script, need a hand with some time logic

Posted: Mon Jan 26, 2015 4:00 pm
by trk204
Hey guys, I'm writing perl based script to check for the existence of some data files that should get ingested. I have the perl sorted out to check if the files are there, I'm just not sure how to setup the monitoring as I *think* it's a somewhat unique situation.

I ingest a set of files 4 times a day (00z,06z,12z,18z ((zulu/UTC time)) and only need to monitor for those files for a couple of hours after their release time. After the couple of hours, if the 00z files are present, they don't need to be checked again until the next day at 00z. Same for each other hours release.

Is my best bet writing custom time periods for each of the 4 releases and provide a window in there in which to run the checks? How would I configure it to keep checking after the time period was over if there was still a crit/warn active at the end of the period? Or if it's possible, during that window run the checks at a high frequency until ok, then back off to one an hour. But generally once the data has arrived and been confirmed ok, there is no reason for additional checks.

So for 00z data

Code: Select all

define timeperiod{
	timeperiod_name		00Z_data
	alias			Grib 00z run
	sunday			00:00-02:00			; Every Sunday of every week
	monday			00:00-02:00	; Every Monday of every week
	tuesday			00:00-02:00		; Every Tuesday of every week
	wednesday			00:00-02:00		; Every Wednesday of every week
	thursday			00:00-02:00		; Every Thursday of every week
	friday			00:00-02:00		; Every Friday of every wee
	saturday			00:00-02:00			; Every Saturday of every week
	}
Thanks guys, not sure why I haven't played with nagios before, it's friggin great!

Re: First check_ script, need a hand with some time logic

Posted: Mon Jan 26, 2015 8:04 pm
by Box293
trk204 wrote: How would I configure it to keep checking after the time period was over if there was still a crit/warn active at the end of the period?
I don't think thats going to be possible. When you use time periods, that restricts the time when a check can be scheduled.

Just thinking off the top of my head.

I think check_interval, retry_interval and max_check_attempts combined with separate checks for each time period might work the best (4 separate checks).

00z time period:
It's window is 00 - 04.
check_interval is 240
retry_interval is 20
max_check_attempts is 1

When the check runs, if it is OK it will be scheduled again 240 minutes in the future, which will be outside of the time window so it will default to the next available time in the time window which is the next day.

If the check returns Warning/Critical it will be scheduled again for 20 minutes in the future. and will keep doing this until it goes back to an OK state. However once the time period window is reached it will remain warning/critical until the next day when it is allows to run again.

Re: First check_ script, need a hand with some time logic

Posted: Tue Jan 27, 2015 11:46 am
by trk204
Awesome, thanks for your help! I'm going to try and implement this sometime this week and see how it works out.

Thanks again!

Re: First check_ script, need a hand with some time logic

Posted: Tue Jan 27, 2015 11:50 am
by tmcdonald
Good luck!

We will keep this open for a while in case you run into any issues.