Page 1 of 2

Next Scheduled Check over the Time Period

Posted: Tue May 28, 2019 5:05 am
by antoniosc
Hi,

we're migrating our old Nagios Version 3.0.3 to Nagios Core Version 4.4.3.
While our parallel run, in order to check it, I see a strange behavior for some alarms scheduled for running in a limited time period.

Example:
timeperiod: 04:00-04:05 (from sunday to saturday)
max_check_attempts 5
check_interval 20
retry_interval 3
notification_interval 30

Please correct me if i'm wrong. While scheduling the "Next scheduled check" data, Nagios consider if the date to be set is included in a valid timeperiod:
if YES -> set the "Next scheduled check" within this interval
if NO -> set the "Next scheduled check" as the first valid date during the next timeperiod (in this example will be next 04:00)

Our "old Nagios" is working in this way so, no problem.
With the new Nagios Core 4.4.3 version seems not working properly so I can see a scenario like this:
Next Scheduled Check: the day after at 04:13
Last Check Time: the last time the check was perfomed during the valid timeperiod (days before)

So it seems that Nagios try to perform the "Next check" as scheduled but, checking the timeperiod, doesn't run the check because that scheduled date is not included in the timeperiod.

How can i check? How can I "force" it?

To complete my scenario, I've migrated all the alarms configurations from the old to the new host so infos like time periods, intervals, check to run are totally the same.

Here a real example:
Current Status:   OK   (for 0d 1h 13m 3s+)
Status Information: OK, it works
Performance Data:  
Current Attempt: 1/3  (HARD state)
Last Check Time: 04-30-2019 14:15:55
Check Type: ACTIVE
Check Latency / Duration: 0.000 / 14.129 seconds
Next Scheduled Check:   05-29-2019 11:08:45

Service configuration:
check_interval 20
notification_interval 30
retry_interval 3
max_check_attempts 3

Timeperiod
monday 11:00-11:01
tuesday 11:00-11:01
wednesday 11:00-11:01
thursday 11:00-11:01
friday 11:00-11:01
saturday 11:00-11:01
sunday 11:00-11:01


EDIT: I've found this in the official documentation, but It's not what i'm facing
Specifying a timeperiod in the check_period directive allows you to restrict the time that Nagios Core perform regularly scheduled, active checks of the host or service. When Nagios Core attempts to reschedule a host or service check, it will make sure that the next check falls within a valid time range within the defined timeperiod. If it doesn't, Nagios Core will adjust the next check time to coincide with the next "valid" time in the specified timeperiod. This means that the host or service may not get checked again for another hour, day, or week, etc.
https://assets.nagios.com/downloads/nag ... riods.html


Thanks!

Re: Next Scheduled Check over the Time Period

Posted: Tue May 28, 2019 4:47 pm
by swolf
Hi @antoniosc,

I'm setting up a configuration to see whether I can reproduce this -- will give you an update in a couple days.

Re: Next Scheduled Check over the Time Period

Posted: Wed May 29, 2019 1:23 pm
by swolf
Okay, I tried this on my development machine and was able to reproduce the problem. I've opened an issue here, hopefully we'll be able to get this fixed shortly.

Re: Next Scheduled Check over the Time Period

Posted: Thu May 30, 2019 3:03 am
by antoniosc
swolf wrote:Okay, I tried this on my development machine and was able to reproduce the problem. I've opened an issue here, hopefully we'll be able to get this fixed shortly.
Ok, thanks for your fast feedback!
I'll wait for updates

Re: Next Scheduled Check over the Time Period

Posted: Thu May 30, 2019 3:23 pm
by benjaminsmith
Hi @antoniosc,
Ok, thanks for your fast feedback!
I'll wait for updates
Thanks for pointing this out. For new updates, you can follow the issue on GitHub.

Re: Next Scheduled Check over the Time Period

Posted: Thu Jun 06, 2019 10:19 am
by antoniosc
Hi,

in meanwhile, could you suggest us a workaround to apply?

Thanks!

Re: Next Scheduled Check over the Time Period

Posted: Thu Jun 06, 2019 10:30 am
by swolf
Hi @antoniosc

I think you have two options here:
1.) Make the timeperiods as large as the check_interval

For the example you posted, this would mean setting your timeperiod to 11:00-11:20 each day, and keeping the service definition the same.

2.) If the check needs to run exactly at 11 AM, then this probably shouldn't be an active check. Instead, set a daily cron job to run the check and submit a passive check result to nagios.

Let us know if you need any further assistance.

Re: Next Scheduled Check over the Time Period

Posted: Thu Jun 06, 2019 10:57 am
by antoniosc
Hi,

thanks!

About solution 1)
Will the issue occur again? I mean, i can schedule:
check_period 11:00 - 11:20
check_interval 20

The first check will run @11:00, the next check will be scheduled @11:20 (and maybe it will still work). What about the next one? It will schedule it @11:40 and the issue is still there.

am I wrong?

Re: Next Scheduled Check over the Time Period

Posted: Fri Jun 07, 2019 4:33 pm
by cdienger
Setting the timeperiod to 11:00-11:20 and the check interval to 20 minutes should cause it to run sometime between 11:00 and 11:20.

Re: Next Scheduled Check over the Time Period

Posted: Tue Jun 11, 2019 7:49 am
by antoniosc
I wonder about update manually, with ad sh script for example, the Next Scheduled Check value. Where this information is stored and updated?
I mean, I would like to do with an automatic script what we can do manually on the GUI using "Re-schedule the next check of this service".

In meanwhile we're checking the update post on GitHub.