Nagios XI VM freezing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Nagios XI VM freezing

Post by jsmurphy »

Hey guys,

I recently upgraded (roughly a week ago) from XI R1.6 to R1.8 running on the 32-bit CentOS Vmware image provided with 1.6. Since the update I've had a weird problem occur twice where CPU utilization hits 100% and memory usage drops to 0% and the OS console becomes completely unavailable. After restarting the VM everything comes back fine and there are absolutely no errors in any log file be it base OS or Nagios.

Help?
nximem.JPG
nxicpu.JPG
1 week CPU:
nxicpuweek.JPG
You do not have the required permissions to view the files attached to this post.
User avatar
niebais
Posts: 349
Joined: Tue Apr 13, 2010 2:15 pm

Re: Nagios XI VM freezing

Post by niebais »

That seems more like a hardware issue than anything else. I've never seen Linux do that unless there was some kernel problem. We use the 32 bit Nagios XI VM at work as well.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios XI VM freezing

Post by jsmurphy »

If it was a hardware problem, we should be seeing issues on the other hosts residing on that ESX server... particularly the other *nix hosts. The XI server ran fine for months until the 1.8 upgrade and the test server which is still running 1.6 is yet to falter... I suppose I can upgrade the test server and see if I get the same level of instability, though all that really gets me is two unstable XI installs :lol: .

A kernel problem is possible, but was the kernel modified in the update? Probably not.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios XI VM freezing

Post by mguthrie »

A single XI license covers a production install, a test install, and a disaster recovery install. For a situation like this, I'd put those extra instances to work ; )

Code: Select all

A kernel problem is possible, but was the kernel modified in the update? Probably not.
It's possible there was an update if a "yum update" has been run recently, but I'm not quite sure of anything that would cause this other than maybe running out of hard disk space.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios XI VM freezing

Post by jsmurphy »

I don't think that server even has internet access right now to run a yum update and the disk space I've been watching like a hawk as I try to gauge how much space I'll need for perf data in the long term.

I suppose I'll have to go with your initial suggestion of putting those licenses to work :p... I may try and stick on some debug logging and see if that catches anything, I'll post if I find something. I feel like there's a certain level of appreciable irony in your monitoring server being the least stable :D
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios XI VM freezing

Post by jsmurphy »

As an update to this on-going saga I've updated the VM again to r1.9 and while I was setting up some checks I noticed it began to run slow and httpd was the culprit so after checking the /var/log/httpd/error_log I found this:
[Tue Dec 20 16:06:18 2011] [error] [client 127.0.0.1] PHP Warning: include_once() [<a href='function.include'>function.include</a>]: Failed opening '/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8
[Tue Dec 20 16:06:21 2011] [error] [client 172.31.121.248] PHP Warning: include_once(/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php) [<a href='function.include-once'>function.include-once</a>]: failed to open stream: No such file or directory in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8, referer: http://server/nagiosxi/config/nagioscorecfg/
[Tue Dec 20 16:06:21 2011] [error] [client 172.31.121.248] PHP Warning: include_once() [<a href='function.include'>function.include</a>]: Failed opening '/usr/local/nagiosxi/html/includes/components/bulkhostimport/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/bulkhostimport/bulkhostimport.inc.php on line 8, referer: http://server/nagiosxi/config/nagioscorecfg/
And so on and so forth, the log is full of these errors (200~300 meg worth per rotation) I don't think it's actually related to the freezing but just in case I thought I would post it anyway.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios XI VM freezing

Post by mguthrie »

We'll take a look at that and see what's going on. Without testing it I might suggest reinstalling the bulk host import wizard and see if the message goes away. If not though, things like that definitely put a hit on performance, so we'll check it out.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios XI VM freezing

Post by jsmurphy »

So as a final update to this as I retire the old CentOS 5.6 server in favour of the 6.0 image, upgrading from r1.8 to r1.9 stopped it from committing suicide on a bi-nightly basis... it has been stable since the update despite the log spam noted in a previous post.
Locked