Page 1 of 1
pos 3.3 upgrae scheduled (fixed) downtime failed to stick
Posted: Thu Sep 06, 2012 3:55 pm
by KiwiBloke
Hi,
NagiosXI VM appliance, recently (yesterday) upgraded to R3.3
We have redeployed some hosts temporarily to perform other roles within our organization (blade server reassignment) and have opted to put the host configuration and monitoring for these into scheduled downtime rather than delete them (they will be coming back once we redeploy additional hardware)
They were already in maintenance mod when the upgrade was performed. Now we have nothing in maintenance mode and every time i put these hosts down for say 60D fixed, it commits the changes ok, then a few minutes later we get alerts stating the servers are down and when I recheck the scheduled maintenance page there is nothing listed.
We also have an alert on the tactical monitoring page showing one unacknowledged service issue, but when i click on the flashing value the search returns nothing. Not sureif these issues are related, but they were both noticed after the upgrade.
Cheers,
C.
Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic
Posted: Fri Sep 07, 2012 10:42 am
by slansing
I know this seems like a simple question, but did you log out, and back in after the upgrade, so it completed? Also, have you had any other issues lately? Such as the monitoring engine going down, or any database issues?
Just making sure nothing has changed besides what you noted, post upgrade.
Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic
Posted: Sun Sep 09, 2012 11:19 pm
by KiwiBloke
Hi,
Thanks for the reply. For the upgrade process these are the steps we took.
login to vm console and halt the VM, snapshot disk, start vm, log into nagios web gui and confirm Nagios working ok under R3.2 (previously the running config was checked and found to contain no errors or warnings)
login to vm console and perform the upgrade, (delete old patch and ungzipped folders etc, run wget to get new patch, extract new patch, run new patch, patch confirms upgrade compelte)
log into nagios web gui and complete the upgrade, run check version to reset all new version warnings and confirm running on R3.3., check operation etc, all seems to be ok.
log into vm console and halt the vm. delete the snapshot (ie merge the changes), restart vm, loginto web gui and confirm version and operation all good.
We have not had any issues with the database so far as i know, however since the upgrade we are also getting a warning for a switch port being down on one of our Cisco switches. its is down, but the link in the alert email when clicked only says "bad ticket" The switch port is indeed down though.
the only other issue we had recently was where the server was surging and trying to do all its checks in one very narrow timeslice, this resulted in dreadful performace of the vm was it was slamming from 0 to 100% cpu for a few seconds every 5mins. I adjusted the reaper settings adn this helped a little, in the end i toggled use_retained_scheduling_info=0 and rebooted the server and it rescheduled its checks evenly over the 5mins. i left this in this state for about 30mins and then toggled use_retained_scheduling_info=1 and rebooted again. The checking schedule is still evenly spread over the 5mins.
That's about all i can think of.
C.
Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic
Posted: Mon Sep 10, 2012 3:16 pm
by scottwilkerson
We are looking into this, it is very likely a bug, will post back once we have a resolution.
Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic
Posted: Wed Sep 19, 2012 6:35 pm
by KiwiBloke
Hi,
I have found a pattern to this that may assist your efforts.
any items placed into scheduled downtime seem to pop out after a new system configuration (host or services or group changes) are committed.
Cheers,
KB.
Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic
Posted: Thu Sep 20, 2012 9:43 am
by scottwilkerson
This is a bug we are aware of, we are hoping to have it solved/resolved before 2012 goes live.