Page 1 of 1

Nagios XI VM freezing

Posted: Mon Dec 12, 2011 6:05 pm
by jsmurphy
Hey guys,

I recently upgraded (roughly a week ago) from XI R1.6 to R1.8 running on the 32-bit CentOS Vmware image provided with 1.6. Since the update I've had a weird problem occur twice where CPU utilization hits 100% and memory usage drops to 0% and the OS console becomes completely unavailable. After restarting the VM everything comes back fine and there are absolutely no errors in any log file be it base OS or Nagios.

Help?
nximem.JPG
nxicpu.JPG
1 week CPU:
nxicpuweek.JPG

Re: Nagios XI VM freezing

Posted: Tue Dec 13, 2011 6:04 pm
by niebais
That seems more like a hardware issue than anything else. I've never seen Linux do that unless there was some kernel problem. We use the 32 bit Nagios XI VM at work as well.

Re: Nagios XI VM freezing

Posted: Tue Dec 13, 2011 6:34 pm
by jsmurphy
If it was a hardware problem, we should be seeing issues on the other hosts residing on that ESX server... particularly the other *nix hosts. The XI server ran fine for months until the 1.8 upgrade and the test server which is still running 1.6 is yet to falter... I suppose I can upgrade the test server and see if I get the same level of instability, though all that really gets me is two unstable XI installs :lol: .

A kernel problem is possible, but was the kernel modified in the update? Probably not.

Re: Nagios XI VM freezing

Posted: Wed Dec 14, 2011 11:13 am
by mguthrie
A single XI license covers a production install, a test install, and a disaster recovery install. For a situation like this, I'd put those extra instances to work ; )

Code: Select all

A kernel problem is possible, but was the kernel modified in the update? Probably not.
It's possible there was an update if a "yum update" has been run recently, but I'm not quite sure of anything that would cause this other than maybe running out of hard disk space.

Re: Nagios XI VM freezing

Posted: Wed Dec 14, 2011 5:09 pm
by jsmurphy
I don't think that server even has internet access right now to run a yum update and the disk space I've been watching like a hawk as I try to gauge how much space I'll need for perf data in the long term.

I suppose I'll have to go with your initial suggestion of putting those licenses to work :p... I may try and stick on some debug logging and see if that catches anything, I'll post if I find something. I feel like there's a certain level of appreciable irony in your monitoring server being the least stable :D

Re: Nagios XI VM freezing

Posted: Tue Dec 20, 2011 12:16 am
by jsmurphy
As an update to this on-going saga I've updated the VM again to r1.9 and while I was setting up some checks I noticed it began to run slow and httpd was the culprit so after checking the /var/log/httpd/error_log I found this:
[Tue Dec 20 16:06:18 2011] [error] [client 127.0.0.1] PHP Warning: include_once() [<a href='function.include'>function.include</a>]: Failed opening '/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8
[Tue Dec 20 16:06:21 2011] [error] [client 172.31.121.248] PHP Warning: include_once(/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php) [<a href='function.include-once'>function.include-once</a>]: failed to open stream: No such file or directory in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8, referer: http://server/nagiosxi/config/nagioscorecfg/
[Tue Dec 20 16:06:21 2011] [error] [client 172.31.121.248] PHP Warning: include_once() [<a href='function.include'>function.include</a>]: Failed opening '/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8, referer: http://server/nagiosxi/config/nagioscorecfg/
And so on and so forth, the log is full of these errors (200~300 meg worth per rotation) I don't think it's actually related to the freezing but just in case I thought I would post it anyway.

Re: Nagios XI VM freezing

Posted: Tue Dec 20, 2011 10:53 am
by mguthrie
We'll take a look at that and see what's going on. Without testing it I might suggest reinstalling the bulk host import wizard and see if the message goes away. If not though, things like that definitely put a hit on performance, so we'll check it out.

Re: Nagios XI VM freezing

Posted: Wed Dec 28, 2011 10:38 pm
by jsmurphy
So as a final update to this as I retire the old CentOS 5.6 server in favour of the 6.0 image, upgrading from r1.8 to r1.9 stopped it from committing suicide on a bi-nightly basis... it has been stable since the update despite the log spam noted in a previous post.