Page 1 of 1

Sort of automated downtime?

Posted: Thu Sep 04, 2014 9:58 pm
by BanditBBS
Ok, had an interesting questioned asked of me today.....

When taking down an environment the DBAs run a script that shuts down a few things, does this and does that. How plausible would it be to add to that script to connect to the NagiosXI server and automagically schedule a downtime for the host and all services on the host the script was initiated from? They then run another script to bring everything back up. That would be the tricky part, we wouldn't want to remove the downtime(reporting and other reasons) but end it and when starting it it would more or less me indefinite until we told it to end the downtime.

They used to have this ability in OEM and asked if it could be added somehow. Every time I think of a way to do it, I then think of a reason that won't work 30 seconds later, lol.

Go with ideas now.....I'll shoot them all down like a WWII pilot!

Re: Sort of automated downtime?

Posted: Fri Sep 05, 2014 2:34 am
by WillemDH
Hey Bandit,

This is pretty simple in fact. Check out my Nagios XI Downtime Framework http://exchange.nagios.org/directory/Pl ... rk/details

In the code you can find somewhere a line where I put the chooses hosts in downtime with NRDP. You could use something like this in your script. You just need a user with permissions and set the appropriate times and comment.

Code: Select all

		# Set host in downtime
		$URL = "http://'REPLACE WITH FQDN OF NAGIOS XI SERVER'/nrdp/?cmd=submitcmd&token='REPLACE WITH NRDP AUTHENTICATION TOKEN'&command=SCHEDULE_HOST_DOWNTIME;$server;$start;$end;1;0;0;Nagios XI Downtime Dummy User;$comment"
		$request1 = [System.Net.WebRequest]::Create($url)
		$response1 = $request1.GetResponse()
		$response1.close()
Hope it helps.

Grtz

Re: Sort of automated downtime?

Posted: Fri Sep 05, 2014 8:19 am
by BanditBBS
That's the issue, not knowing the exact outage window. Their script does write a "blackout" file to the / folder. I'm thinking of adding a check for that file existing and if it does and no downtime exists, then schedule a 6 minute downtime. This check will run every 5 minutes and if the file exists and a downtime is active, then extend it by 5 minutes. Not sure if you can modify an active downtime. I'll research that as time exists.

The other option I thought of was if the file exists schedule a 1 month downtime. Then whenever the file doesn't exist, modify the downtime to end. I pray that is possible, I can imagine the code in my head already if you can modify an active downtime.

Re: Sort of automated downtime?

Posted: Fri Sep 05, 2014 9:07 am
by WillemDH
Ah sry, misread somewhere. Not knowing when the downtime ends will be tougher to solve.

Have a nice weekend!

Re: Sort of automated downtime?

Posted: Fri Sep 05, 2014 2:24 pm
by tmcdonald
Probably will end up needing some sort of flat file to store the state of the downtime between runs of the remote script as you mentioned. You can certainly delete downtime programmatically:

http://old.nagios.org/developerinfo/ext ... and_id=126

Just gotta find the id for it. Might need some nasty SQL to find that :o

Re: Sort of automated downtime?

Posted: Fri Sep 05, 2014 2:30 pm
by BanditBBS
tmcdonald wrote:Probably will end up needing some sort of flat file to store the state of the downtime between runs of the remote script as you mentioned. You can certainly delete downtime programmatically:

http://old.nagios.org/developerinfo/ext ... and_id=126

Just gotta find the id for it. Might need some nasty SQL to find that :o
Trevor, yeah, I know you can delete a downtime. However, that would screw up te information for the SLA report and anything else that relies on that data. Is there any way to modify a downtime to make it end "now"? Or change the end date/time to 1 minute in the future?

Edit: I went and actually read your link, and more awesome vague wording(lol):

Code: Select all

If the downtime is currently in effect, the service will come out of scheduled downtime (as long as there are no other overlapping active downtime entries)
WTF, does that mean it will just end it early or that it will delete it as the command is supposed to do? I it just ends it early, then that is my answer.

Re: Sort of automated downtime?

Posted: Mon Sep 08, 2014 2:50 pm
by sreinhardt
Much like your 5 minute check with 6 minutes of downtime, this is saying that if you attempt to remove a downtime, and it is the only downtime in effect at that time for that host or service, it will be properly removed from downtime. However if you have overlapping downtime, such as if you just ran the 5 minute check and rescheduled, so that you have the last minute or less of one downtime, and ~6 minutes of a new one from your check, it would NOT remove the host or service from downtime until only the ~6 minute DT was the only one in effect. Make more sense? Still not really what you were looking for though.

Re: Sort of automated downtime?

Posted: Mon Sep 08, 2014 3:11 pm
by BanditBBS
sreinhardt wrote:Much like your 5 minute check with 6 minutes of downtime, this is saying that if you attempt to remove a downtime, and it is the only downtime in effect at that time for that host or service, it will be properly removed from downtime. However if you have overlapping downtime, such as if you just ran the 5 minute check and rescheduled, so that you have the last minute or less of one downtime, and ~6 minutes of a new one from your check, it would NOT remove the host or service from downtime until only the ~6 minute DT was the only one in effect. Make more sense? Still not really what you were looking for though.
The wording "remove from downtime" is what is confusing. If it is the only downtime, how does it remove it, by ending the downtime early or by deleting the downtime? That is the distinction I am trying to understand. If it deletes the downtime then that means data would be wrong in the reports that excludes downtimes. However, if it somehow ends it early, then those reports will still have accurate data.

Does that explain my confusion better?

Re: Sort of automated downtime?

Posted: Mon Sep 08, 2014 5:05 pm
by abrist
If you delete the downtime. You will end it early though it is still in the logs so it will not effect reporting (other than it may potentially pad your "uptime" percentage a bit less than leaving the downtime to run for the duration.

Re: Sort of automated downtime?

Posted: Mon Sep 08, 2014 5:11 pm
by BanditBBS
abrist wrote:If you delete the downtime. You will end it early though it is still in the logs so it will not effect reporting (other than it may potentially pad your "uptime" percentage a bit less than leaving the downtime to run for the duration.
Andy, screencap this because I may only say it once a decade...

I love you, you are a god among men!

Technically it is only because you happened to be the one to reply, but you gotta take what you can get!

You can close this. I'll work on my automated downtime add/remove check and will add it to the exchange when completed. I'm excited to work on this one!