Sort of automated downtime?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Sort of automated downtime?

Post by BanditBBS »

Ok, had an interesting questioned asked of me today.....

When taking down an environment the DBAs run a script that shuts down a few things, does this and does that. How plausible would it be to add to that script to connect to the NagiosXI server and automagically schedule a downtime for the host and all services on the host the script was initiated from? They then run another script to bring everything back up. That would be the tricky part, we wouldn't want to remove the downtime(reporting and other reasons) but end it and when starting it it would more or less me indefinite until we told it to end the downtime.

They used to have this ability in OEM and asked if it could be added somehow. Every time I think of a way to do it, I then think of a reason that won't work 30 seconds later, lol.

Go with ideas now.....I'll shoot them all down like a WWII pilot!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Sort of automated downtime?

Post by WillemDH »

Hey Bandit,

This is pretty simple in fact. Check out my Nagios XI Downtime Framework http://exchange.nagios.org/directory/Pl ... rk/details

In the code you can find somewhere a line where I put the chooses hosts in downtime with NRDP. You could use something like this in your script. You just need a user with permissions and set the appropriate times and comment.

Code: Select all

		# Set host in downtime
		$URL = "http://'REPLACE WITH FQDN OF NAGIOS XI SERVER'/nrdp/?cmd=submitcmd&token='REPLACE WITH NRDP AUTHENTICATION TOKEN'&command=SCHEDULE_HOST_DOWNTIME;$server;$start;$end;1;0;0;Nagios XI Downtime Dummy User;$comment"
		$request1 = [System.Net.WebRequest]::Create($url)
		$response1 = $request1.GetResponse()
		$response1.close()
Hope it helps.

Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Sort of automated downtime?

Post by BanditBBS »

That's the issue, not knowing the exact outage window. Their script does write a "blackout" file to the / folder. I'm thinking of adding a check for that file existing and if it does and no downtime exists, then schedule a 6 minute downtime. This check will run every 5 minutes and if the file exists and a downtime is active, then extend it by 5 minutes. Not sure if you can modify an active downtime. I'll research that as time exists.

The other option I thought of was if the file exists schedule a 1 month downtime. Then whenever the file doesn't exist, modify the downtime to end. I pray that is possible, I can imagine the code in my head already if you can modify an active downtime.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Sort of automated downtime?

Post by WillemDH »

Ah sry, misread somewhere. Not knowing when the downtime ends will be tougher to solve.

Have a nice weekend!
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Sort of automated downtime?

Post by tmcdonald »

Probably will end up needing some sort of flat file to store the state of the downtime between runs of the remote script as you mentioned. You can certainly delete downtime programmatically:

http://old.nagios.org/developerinfo/ext ... and_id=126

Just gotta find the id for it. Might need some nasty SQL to find that :o
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Sort of automated downtime?

Post by BanditBBS »

tmcdonald wrote:Probably will end up needing some sort of flat file to store the state of the downtime between runs of the remote script as you mentioned. You can certainly delete downtime programmatically:

http://old.nagios.org/developerinfo/ext ... and_id=126

Just gotta find the id for it. Might need some nasty SQL to find that :o
Trevor, yeah, I know you can delete a downtime. However, that would screw up te information for the SLA report and anything else that relies on that data. Is there any way to modify a downtime to make it end "now"? Or change the end date/time to 1 minute in the future?

Edit: I went and actually read your link, and more awesome vague wording(lol):

Code: Select all

If the downtime is currently in effect, the service will come out of scheduled downtime (as long as there are no other overlapping active downtime entries)
WTF, does that mean it will just end it early or that it will delete it as the command is supposed to do? I it just ends it early, then that is my answer.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Sort of automated downtime?

Post by sreinhardt »

Much like your 5 minute check with 6 minutes of downtime, this is saying that if you attempt to remove a downtime, and it is the only downtime in effect at that time for that host or service, it will be properly removed from downtime. However if you have overlapping downtime, such as if you just ran the 5 minute check and rescheduled, so that you have the last minute or less of one downtime, and ~6 minutes of a new one from your check, it would NOT remove the host or service from downtime until only the ~6 minute DT was the only one in effect. Make more sense? Still not really what you were looking for though.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Sort of automated downtime?

Post by BanditBBS »

sreinhardt wrote:Much like your 5 minute check with 6 minutes of downtime, this is saying that if you attempt to remove a downtime, and it is the only downtime in effect at that time for that host or service, it will be properly removed from downtime. However if you have overlapping downtime, such as if you just ran the 5 minute check and rescheduled, so that you have the last minute or less of one downtime, and ~6 minutes of a new one from your check, it would NOT remove the host or service from downtime until only the ~6 minute DT was the only one in effect. Make more sense? Still not really what you were looking for though.
The wording "remove from downtime" is what is confusing. If it is the only downtime, how does it remove it, by ending the downtime early or by deleting the downtime? That is the distinction I am trying to understand. If it deletes the downtime then that means data would be wrong in the reports that excludes downtimes. However, if it somehow ends it early, then those reports will still have accurate data.

Does that explain my confusion better?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Sort of automated downtime?

Post by abrist »

If you delete the downtime. You will end it early though it is still in the logs so it will not effect reporting (other than it may potentially pad your "uptime" percentage a bit less than leaving the downtime to run for the duration.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Sort of automated downtime?

Post by BanditBBS »

abrist wrote:If you delete the downtime. You will end it early though it is still in the logs so it will not effect reporting (other than it may potentially pad your "uptime" percentage a bit less than leaving the downtime to run for the duration.
Andy, screencap this because I may only say it once a decade...

I love you, you are a god among men!

Technically it is only because you happened to be the one to reply, but you gotta take what you can get!

You can close this. I'll work on my automated downtime add/remove check and will add it to the exchange when completed. I'm excited to work on this one!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked