Page 1 of 1

Scheduled server reboots

Posted: Wed Jul 08, 2015 11:09 am
by mhixson2
Hello!

I'm looking for a way to schedule server reboots using Nagios. I asked this question during a training/demo session and was told it's possible using event handlers, but how to accomplish that escapes me. The goal is to reboot all of our Windows servers every weekend at a specific time during our maintenance window. We are using Nagios XI 2014R2.7, NSClient++ 0.4.3.143, and NRPE checks.

I have been playing around with this setup with mixed results so far. I loosely followed the guide here.
  • - Batch file located in the C:\Program Files\NSClient++\scripts directory containing the reboot command
    - Entry in nsclient.ini /settings/external scripts/scripts section: reboot_machine=scripts\reboot_machine.bat
    - Service assigned to host running command: $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ where ARG1 = "reboot_machine"
    - Custom check period set up on the service for a duration of 1 minute at the time I want the reboot to happen
It seems to be working consistently, thought I'm not sure what the check settings should be for this service to ensure it only executes the reboot command once. The check interval and retry settings really aren't applicable to this situation, but are required fields.

Is there a better way to do this? I would hope so. It would be awesome if in the same service setup we can have a reboot event triggered at a specific time, and have the last reboot time monitored and reported back.

Thanks!
Mike

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 11:19 am
by jdalrymple
Perhaps it would be better to use a windows scheduled task that sends a passive result to Nagios simply saying it's been done? Combine that with a check_uptime and you should have a fairly solid solution.

Otherwise there is no great way to promise the check only executes once. Nagios' scheduler wasn't written with executing things at specific times and intervals in mind.

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 11:41 am
by mhixson2
jdalrymple wrote:Perhaps it would be better to use a windows scheduled task that sends a passive result to Nagios simply saying it's been done?
Valid point. I am reaching out internally to remember the reason because I know we didn't go that route in the past, so I'm not sure that will work for us. But I'll look into it again. If Nagios can't handle it, that might be our only choice. We're in the process of migrating to Nagios and our current monitoring solution handles reboots for us, and with our host group configuration in Nagios, controlling reboots there (as they're not exactly the same across all servers) would be ideal.

How would I send a passive check to Nagios via a scheduled task? Would any additional ports need configured for that? We are only allowing communication from the Nagios server to the monitored Windows boxes over port 5666 for NRPE right now, which I believe is an active only check setup.

Thanks,
Mike

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 12:39 pm
by jdalrymple
mhixson2 wrote:If Nagios can't handle it, that might be our only choice.
Nagios can do it for you - the trick is going to be making it do it only once (if you care enough). If you want to play the odds you could have a 1 minute long time period, assuming your machine takes over 60 seconds to reboot it should theoretically only see one active check (reboot command) come in. The alternative is to add logic to your script, also not terribly hard, "if uptime is greater than 10 minutes reboot else exit"
mhixson2 wrote:How would I send a passive check to Nagios via a scheduled task? Would any additional ports need configured for that?
I prefer NCSA because it's what I know - you'd have a bit of configuration to do but the wizard walks you through it. You would have to open port 5667, but again the wizard outlines how. The (probably preferred) alternative would be NRDS which doesn't require any additional ports be opened, the check is submitted in the form of xml to a URL on port 443, already open.

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 1:56 pm
by mhixson2
jdalrymple wrote:The alternative is to add logic to your script, also not terribly hard, "if uptime is greater than 10 minutes reboot else exit"
Nice. This is the exact solution and coworker and I came up with after discussing the situation with him. Good to see it's logical to someone else too haha. This is probably what we'll do. But we'll be using a much larger value for uptime to solve for the issue of someone running a force check on this service manually and accidentally rebooting the server. So the uptime value will span up to an hour or so before it's next scheduled reboot. Time to test.

My plan is to set the check interval and retry for a value greater than the time period i'm setting the check to run. So if the time period is for one minute, then set the check interval and retry for 5 minutes to keep it from kicking off a second time. Does that make sense?
jdalrymple wrote:I prefer NCSA because it's what I know - you'd have a bit of configuration to do but the wizard walks you through it. You would have to open port 5667, but again the wizard outlines how. The (probably preferred) alternative would be NRDS which doesn't require any additional ports be opened, the check is submitted in the form of xml to a URL on port 443, already open.
Ok, we are just getting things set up for an eventual, and gradual, mass deployment to our ~1000 servers. Unfortunately, we are a bit pressed for time so we're picking up what we can on the way. In your opinion, should we be entertaining the thought of using something other than NRPE? We were steered in that direction over check_nt for Windows in the beginning for reasons I'd have to look up. I think check_nt might be depreciated? Or going to be? We have not researched NSCA or NRDS much so any input is appreciated.

Thanks!
Mike

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 3:16 pm
by jdalrymple
mhixson2 wrote:But we'll be using a much larger value for uptime to solve for the issue of someone running a force check on this service manually and accidentally rebooting the server.
Note that you can prevent users from executing commands - I suggest taking advantage of that ability.
mhixson2 wrote:...then set the check interval and retry for 5 minutes to keep it from kicking off a second time. Does that make sense?
Yes
mhixson2 wrote:mass deployment to our ~1000 servers
Be sure to stagger the schedules - rebooting that many servers at once can be catastrophic. Even when it seems like they aren't sharing any resources that could be burdened - they generally are in one way or the other and you just don't know it or think about it.
mhixson2 wrote:We were steered in that direction over check_nt for Windows in the beginning for reasons I'd have to look up. I think check_nt might be depreciated? Or going to be? We have not researched NSCA or NRDS much so any input is appreciated.
NRPE (nsclient++ on Windows) is the right choice
check_nt cannot run arbitrary commands, although it isn't deprecated
NSCA and NRDS are passive - this would take you back to thinking about using the Windows Scheduler.

Re: Scheduled server reboots

Posted: Wed Jul 08, 2015 3:19 pm
by mhixson2
Ok, thank you!
All points taken. I think I'm good here.