Page 1 of 1

Check service at same time every day - persistenly

Posted: Fri Jan 24, 2014 4:32 am
by FTL
Hi All,

I have a few service checks that checks the job status of Scheduled Tasks of Windows Servers.

They run every 1 day between 9-9.30am on each of the servers.

However if i restart nagios service or reboot server i lose the set check times and have to go manually in and re-schedule next check back to the required time of day on for each service

Is it possible to make the service check persistenly run at a set time of the day - even through reboots and service restarts?

Thankyou

Re: Check service at same time every day - persistenly

Posted: Fri Jan 24, 2014 2:35 pm
by slansing
Okay, so you are manually setting a check reschedule time? You are not setting these services up on a time period that only exists between 9-9:30am in their service definition?

Re: Check service at same time every day - persistenly

Posted: Mon Jan 27, 2014 5:19 am
by FTL
Hi Slansig,

Sorry, appears i wasnt clear enough in my description.

I have 6 servers that i run a service check on to check the status of the scheduled tasks that are running.

Server 1 - 9.00am
Server 2 - 9.05am
Server 3 - 9.10am
Server 4 - 9.15am
Server 5 - 9.20am
Server 6 - 9.25am

My host and service check for 1 server as an example:

Code: Select all


define service{
    use            service-schedtask              ; See Service Template section below
    host_name        SERVER 1
    service_description    SCHEDULED TASK RESULT
    check_command        check_schedtasks              ; See Commands section below
    }
The checks run once every 24 hours as defined in my service-schedtask template

Code: Select all

define service{
    name                 service-schedtask        ; The name of this host template (used above in the checks)
    check_period             server_24x7        ; Server are monitored at all times
    check_interval             1440            ; Server are checked every 1 day when in OK state
    retry_interval             180            ; Server checked every 3 hours if in problem state
    max_check_attempts         3                ; Server checked 3 times to determine if its Up or Down state
    notification_period         server_24x7        ; Emails and Text are sent out any time of day
    notification_interval         180            ; Resend Notifications every 3 hours
    notification_options         c,r            ; Only send alerts for servers in CRITICAL or RECOVERY state
    notifications_enabled         1                ; Notifications are enabled
    contact_groups             servers email, servers sms    ; Alerts sent to contacts in these groups
    event_handler_enabled         1                ; Host event handler is enabled
    process_perf_data         1                ; Performace data is processed
    retain_status_information    1                ; Status Info is kept between server restarts
    retain_nonstatus_information 1                ; Non-Status information is kept between server restarts
    passive_checks_enabled         0                ; Passive Checks are disabled
    obsess_over_service         0                 ; We do not obsess over the server if in problem state
    check_freshness              0                 ; We do not check this server for freshness
    flap_detection_enabled         0                ; Flap Detection is disabled
    failure_prediction_enabled   0                ; We will wait for it to actually fail thankyou!!
    register              0
    }
So after Nagios is rebooted i manually went in and forced a scheduled check of the service(s) on their respective hosts at the specific times set above.

If Nagios stays up then this works fine - Server 1 will get checked at 9am daily, server 2 at 9.05am etc etc

But if i restart the Nagios server or restart the nagios service, it loses those times.
So say i reboot Nagios at 6pm one evening, when it comes back up it might check server 1 at 7.24pm, server 2 at 9.43pm etc etc.

I then have to go back in and manually force reschedule the checks to run again for each service at the times said above.

Now i understand this is normal as when Nagios restarts it re-schedules all the checks it does depending on its load from the moment it comes back up.

My question is can i make these particular service checks on the 6 servers run at the times set above persistently through reboots and restarts without having to go in and do a manual rescheduled check at said times if the server/service gets restarted?

Thanks

Re: Check service at same time every day - persistenly

Posted: Mon Jan 27, 2014 11:59 am
by sreinhardt
The hardest part would be getting it at those exact moments. My first suggestion would be to define a timeperiod that check_period will use, that will restrict it to 9-10 or so maybe 9-9:30, and use that for all of these service checks. The issue with this is that they could get jumbled and not check in the right order. In that case you might need to define a 5 minute time period for each different service check, and use that to specifically inform the nagios engine of when to check it. Does that make more sense to you? I am happy to give an example if needed.

Re: Check service at same time every day - persistenly

Posted: Tue Jan 28, 2014 4:54 am
by FTL
Yes good suggestion - didn't think of that.

The timeperiod would work as I'm not really fussed which order they are checked in - as long as they are all checked just between 9-9.30 each morning.
Its only between those hours so its first thing in the morning and the appropriate admin can sort any issues out before getting snowed under and forgetting it!

Thankyou Sreinhardt

Re: Check service at same time every day - persistenly

Posted: Tue Jan 28, 2014 2:44 pm
by tmcdonald
We can leave this thread open until you test that time period setup out, otherwise if you are satisfied it will work we can close it now. Up to you.

Re: Check service at same time every day - persistenly

Posted: Wed Jan 29, 2014 5:37 am
by FTL
Cant seem to get it working :(

apologies for the formatting of the code - thats linux way of telling me not to copy and paste into Windows first :)

I have set the service check:

Code: Select all

define service{
    use                            service-schedtask               ; See Service Template section below
    host_name                 SERVER1
    service_description    SCHEDULED TASK RESULT
    check_command         check_schedtasks               ; See Commands section below
    }
The template that belongs to that service check:

Code: Select all

define service{
    name                                       service-schedtask        ; The name of this host template (used above in the checks)
    check_period                            server_schedtask        ; Service is monitored only between 9am and 9.30am daily
    check_interval                          1440                           ; Service is checked every 1 day when in OK state
    retry_interval                            180                            ; Service is checked every 3 hours if in problem state
    max_check_attempts                 3                               ; Service is checked 3 times to determine if its Up or Down state
    notification_period                     server_24x7              ; Emails and Text are sent out any time of day
    notification_interval                   180                           ; Resend Notifications every 3 hours
    notification_options                    c,r                            ; Only send alerts for servers in CRITICAL or RECOVERY state
    notifications_enabled                  1                              ; Notifications are enabled
    contact_groups                          servers email, servers sms    ; Alerts sent to contacts in these groups
    event_handler_enabled               1                                ; Host event handler is enabled
    process_perf_data                      1                                ; Performace data is processed
    retain_status_information           1                               ; Status Info is kept between server restarts
    retain_nonstatus_information      1                               ; Non-Status information is kept between server restarts
    passive_checks_enabled              0                               ; Passive Checks are disabled
    obsess_over_service                   0                               ; We do not obsess over the server if in problem state
    check_freshness                         0                               ; We do not check this server for freshness
    flap_detection_enabled               0                               ; Flap Detection is disabled
    failure_prediction_enabled          0                               ; We will wait for it to actually fail thankyou!!
    register                                      0
    }
The time period that belongs to that template:

Code: Select all

define timeperiod{
    timeperiod_name        server_schedtask
    alias                           Half Hour Period for scheduled task checks
    sunday                       09:00-09:30
    monday                     09:00-09:30
    tuesday                     09:00-09:30
    wednesday                 09:00-09:30
    thursday                    09:00-09:30
    friday                        09:00-09:30
    saturday                    09:00-09:30
    }

However even after this morning restarting the service and even restarting the server it doesnt schedule the next check to be in this timeperiod

1 of the server shows : Next Scheduled Check: 01-29-2014 23:31:06
Another shows: Next Scheduled Check: 01-29-2014 17:40:09
Another shows: Next Scheduled Check: 01-29-2014 17:39:58

I cant see what i have missed.

Is it this line from the template?
retain_nonstatus_information 1 ; Non-Status information is kept between server restarts

Should this be 0?

Re: Check service at same time every day - persistenly

Posted: Thu Jan 30, 2014 12:39 pm
by lmiltchev
Hm-m, the timeperiod looks fine. Try disabling the retention of non-status information and see if this is going to help. Setting "retain_nonstatus_information = 0" will cause nagios to take the initial values from the configs, rather than form the state retention file when it restarts. Hope this helps.

Re: Check service at same time every day - persistenly

Posted: Fri Jan 31, 2014 4:47 am
by FTL
It appears it needed to do the final check it thought it was doing at those wrong times before picking up the new timeperiod check times.

All 6 servers have now re-scheduled themselves and checking at 9am. Would like them seperated but as long as they are checked im happy with this.

Thanks for all your help guys.

Re: Check service at same time every day - persistenly

Posted: Fri Jan 31, 2014 10:11 am
by tmcdonald
Alright, well I'll go ahead and lock this up as Solved. If you have any problems in the future feel free to start a new topic.