pos 3.3 upgrae scheduled (fixed) downtime failed to stick

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

pos 3.3 upgrae scheduled (fixed) downtime failed to stick

Post by KiwiBloke »

Hi,

NagiosXI VM appliance, recently (yesterday) upgraded to R3.3

We have redeployed some hosts temporarily to perform other roles within our organization (blade server reassignment) and have opted to put the host configuration and monitoring for these into scheduled downtime rather than delete them (they will be coming back once we redeploy additional hardware)

They were already in maintenance mod when the upgrade was performed. Now we have nothing in maintenance mode and every time i put these hosts down for say 60D fixed, it commits the changes ok, then a few minutes later we get alerts stating the servers are down and when I recheck the scheduled maintenance page there is nothing listed.

We also have an alert on the tactical monitoring page showing one unacknowledged service issue, but when i click on the flashing value the search returns nothing. Not sureif these issues are related, but they were both noticed after the upgrade.

Cheers,

C.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic

Post by slansing »

I know this seems like a simple question, but did you log out, and back in after the upgrade, so it completed? Also, have you had any other issues lately? Such as the monitoring engine going down, or any database issues?

Just making sure nothing has changed besides what you noted, post upgrade.
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic

Post by KiwiBloke »

Hi,

Thanks for the reply. For the upgrade process these are the steps we took.

login to vm console and halt the VM, snapshot disk, start vm, log into nagios web gui and confirm Nagios working ok under R3.2 (previously the running config was checked and found to contain no errors or warnings)

login to vm console and perform the upgrade, (delete old patch and ungzipped folders etc, run wget to get new patch, extract new patch, run new patch, patch confirms upgrade compelte)

log into nagios web gui and complete the upgrade, run check version to reset all new version warnings and confirm running on R3.3., check operation etc, all seems to be ok.

log into vm console and halt the vm. delete the snapshot (ie merge the changes), restart vm, loginto web gui and confirm version and operation all good.

We have not had any issues with the database so far as i know, however since the upgrade we are also getting a warning for a switch port being down on one of our Cisco switches. its is down, but the link in the alert email when clicked only says "bad ticket" The switch port is indeed down though.

the only other issue we had recently was where the server was surging and trying to do all its checks in one very narrow timeslice, this resulted in dreadful performace of the vm was it was slamming from 0 to 100% cpu for a few seconds every 5mins. I adjusted the reaper settings adn this helped a little, in the end i toggled use_retained_scheduling_info=0 and rebooted the server and it rescheduled its checks evenly over the 5mins. i left this in this state for about 30mins and then toggled use_retained_scheduling_info=1 and rebooted again. The checking schedule is still evenly spread over the 5mins.

That's about all i can think of.

C.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic

Post by scottwilkerson »

We are looking into this, it is very likely a bug, will post back once we have a resolution.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic

Post by KiwiBloke »

Hi,

I have found a pattern to this that may assist your efforts.

any items placed into scheduled downtime seem to pop out after a new system configuration (host or services or group changes) are committed.

Cheers,

KB.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: pos 3.3 upgrae scheduled (fixed) downtime failed to stic

Post by scottwilkerson »

This is a bug we are aware of, we are hoping to have it solved/resolved before 2012 goes live.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked