Page 1 of 2

Need help with Nagios XI 2014 upgrade

Posted: Thu May 15, 2014 8:32 am
by jwelch
Hi,
I tried to upgrade my production server last night and it failed. Everything seemed ok until it stopped saying there was a configuration problem. From then on I was in config file hell. It would say one service had no hosts (true), then no more errors were shown. I'd fix that and it would show a different service with no hosts (true). I'd fix that, then it would go back to the original. It looked like it was rolling back the config so I had to re-do the previous fixes before I could move to the next. Eventually, It just showed an error but no details. Having run out of time, I restored back to the backup I did just before starting the upgrade. I'm back to my original version and config, but would still like to attempt the upgrade. I'll add more details but just wanted to get this started before I get pulled into other fires. If there are specific config changes I need to make before the upgrade, just let me know and I'll give that a try. Meanwhile, I'm going to restore my test box back to the 2012 version (it upgraded with no problems, but was only using a simple set of hosts/services), then try to restore my prod config to the test box so I can try the upgrade without the time pressure.

Re: Need help with Nagios XI 2014 upgrade

Posted: Thu May 15, 2014 12:37 pm
by lmiltchev
We've seen quite many cases already of failed upgrades because of the "ghost" hosts/services. I am not saying this is your issue, but you can check it out just in case. List the "hosts" and "services" directory to check the timestamps and see if all of the hosts/services are updating.

Code: Select all

ll -t  /usr/local/nagios/etc/hosts
ll -t  /usr/local/nagios/etc/services
If you see a very old timestamp, it's possible you have a ghost host/service. Double-check if this host/service exists in the CCM. If these hosts/services are not in the CCM and you are not using them, but they are in the configs, you can remove them manually from the CLI.

Code: Select all

cd /usr/local/nagios/etc/hosts
rm -rf <ghost host>
cd /usr/local/nagios/etc/services
rm -rf <ghost service>
After this is taken care of, you can try the upgrade again. Hope this helps.

Re: Need help with Nagios XI 2014 upgrade

Posted: Sat May 17, 2014 8:11 pm
by jwelch
Yeah, I'd already deleted the orphaned services as part of troubleshooting. I decided to completely remove XI from my development box and reinstall. I did that and copied a backup from my prod box and restored it. I couldn't get any config changes to take, though XI seemed to be running fine. Eventually I realized that I'd only tested simple configs on the devl box and had never bumped up the PHP resource settings in /etc/php.ini. That got the prod config working on the devl box. Then I did the 2014 upgrade and got the same problem as I had seen on the prod box. A blurb about config errors with no details and the upgrade stopped.

At this point a 'Apply Configuration' failed, but there was no snapshot to show errors and running a config check by hand showed no errors.

After much wailing and gnashing of teeth, I determined that if I deleted the one dependent service config and the one service escalation config that I had implemented on my system (no related services between the dependent and escalation configs) that the configuration was now working and I was able to re-run the upgrade and it completed on the development box.

Since then, I deleted the dependent and escalation settings on the prod box and successfully installed the upgrade. It's (mostly) working now. I did have to fix some problems with Socket6 definitions that I'd done a long time ago but re-surfaced, causing me to get a nasty email every 5 minutes from mrtg. I also had to tweak some checks that had used \\ to escape characters but now apparently only need a single slash. And for some reason the https checks using check_http on our cisco vpn hosts failed after the upgrade. I added a -N parameter to the checks to fix that.

The remaining problems are;

Apply Configuration works, but the screen that ticks off the time to completion never stops the ........ (or at least I got sick of waiting after a few minutes. The changes show as sync'd on another tab after 20-30 seconds and the checks start working, but the page never indicates the the reconfig is done.

The system status icons in the upper right corner show: System OK Ck Ck Ck i i i
i.e:
Monitoring Engine is running
Performance Grapher is running
Database Backend is running
Active Host Checks are disabled
Active Service Checks are disabled
Notifications are disabled

I am seeing service and host checks being made and I have gotten at least one notification since the upgrade so I suspect that nagios is working.

Another weirdness is that when I schedule an immediate check for a host or service, the dialog box times out and says "the check was not able to be
scheduled... the system may be busy", but I the check does run and the one I was looking at changed from Critical to OK.

These are annoying, but not showstoppers since checks do seem to be working. I think it will be ok to wait till Monday and pick this up again then.

Re: Need help with Nagios XI 2014 upgrade

Posted: Sun May 18, 2014 1:02 pm
by bmiadmins
Hi,

In our case I had to upgrade nagios core, nagios-xi, and rebuild ndo2utils as separate actions. The upgrade script did not work out of the box for me either. Steps taken :

1. Ran ./upgrade from /tmp/nagiosxi < or path decompressed tarball sits in >
2. Cleaned up config errors and ghost services. I actually removed the complaining configs and re-wrote the configs from the admin panel under tools.
3. Rebuilt ndo2utils. In my case I killed off all pids spawned by the nagios user.
4. Ran upgrade from the nagiosxi dir. This required that I set some static entries for the file name and url. Try the following:

bash -x /<path to nagiosxi folder>/scripts/upgrade_to_latest.sh -f http://assets.nagios.com/downloads/nagiosxi/<uri to file>

Once everything finished all checks are working and we have all the nice new features that nagiosxi offers. On the ghost services issue, I found that if the service is listed as a dep of anything it gets re-written when you try to generate new configs. Seems the best option is to disable < set inactive > anything that calls to the service definition, mv|rm|fix the problem service that shows up when doing the verify, build new configs with the tools under the ccm, and verify should then work.


YMMV

John

Re: Need help with Nagios XI 2014 upgrade

Posted: Sun May 18, 2014 7:12 pm
by jwelch
Hmmm....running another update manually is worth a try.

No change. (on dev box). Since I have a working system, I think I'll wait till Monday to see what the developers have to say before I screw with it any more. Thanks for the suggestions though.

Re: Need help with Nagios XI 2014 upgrade

Posted: Mon May 19, 2014 11:54 am
by jwelch
As for the 'Monitoring engine not running' indicator symptom, I found the following in the httpd error_log file:

[Mon May 19 11:59:16 2014] [error] [client 1.2.3.4] PHP Warning: Division by zero in /usr/local/nagiosxi/html/includes/components/xicore/ajaxhelpers-monitoringengine.inc.php on line 510, referer: https://myserver.mydomain/nagiosxi/admi ... ringengine

Re: Need help with Nagios XI 2014 upgrade

Posted: Mon May 19, 2014 3:35 pm
by abrist
Try tmcdonald's suggestion in the following thread: http://support.nagios.com/forum/viewtop ... 129#p98353

I had the same issue on my XI test box and the new version of utils-backend.inc fixed the issue for myself.

Re: Need help with Nagios XI 2014 upgrade

Posted: Mon May 19, 2014 6:18 pm
by jwelch
copied the original to utils-backend.inc.php.orig then put the new one in place. No effect as far as I can tell.
I restarted nagios and httpd, cleared my browser cache. No change. Tried IE on another workstation. No joy.
I even rebooted my development box after copying over the new file and I can see no difference.

-rwxr-x--- 1 nagios nagios 24992 May 15 14:52 utils-backend.inc.php
-rwxr-x--- 1 nagios nagios 24992 May 15 14:52 utils-backend.inc.php.new
-rwxr-x--- 1 nagios nagios 24276 May 17 18:36 utils-backend.inc.php.orig

Re: Need help with Nagios XI 2014 upgrade

Posted: Tue May 20, 2014 1:45 pm
by lmiltchev
We may need to schedule a remote session to further troubleshoot your problem. Is this an option for you? If it is, please, send us an email at [email protected]. Type "Need help with Nagios XI 2014 upgrade" in the email's subject field.

Re: Need help with Nagios XI 2014 upgrade

Posted: Tue May 20, 2014 2:43 pm
by jwelch
email sent