Page 1 of 1

Nagios XI Lockup Issues

Posted: Tue Dec 07, 2010 1:33 am
by Box293
When I came in to work today our Nagios XI server was not responsive, could not access web page.
The console of the VM was refreshing with the following information.
Nagios Frozen.png
I was unable to do a normal restart, the VM under vCenter showed that VMware tools was not running, I needed to hard power off the VM and then power it back on again.

I rebooted the VM at 2pm as I had an issue and I wasn't sure if it was Nagios or not. Turned out to be nothing to do with Nagios.

Then at 3.56pm it was locked up again with the same problems, needed a hard power cycle again.
Nagios Frozen #2.png
Any ideas?

Re: Nagios XI Lockup Issues

Posted: Tue Dec 07, 2010 10:25 am
by tonyyarusso
This might be the VM running out of memory. I'm used to a slightly different format for those messages on a different Linux distro, but this may well be how CentOS prints it. Try increasing the RAM allocated to the VM, and keep an eye on graphs for available memory (preferably not counting buffers/caches).

Re: Nagios XI Lockup Issues

Posted: Wed Dec 08, 2010 1:02 am
by Box293
I've upgraded the memory from 1.5GB to 3GB and all seems good.

I've keep an eye on the Server Statistics dashlet and the free memory hovers around 730 MB - 450 MB.

Is there a way of observing the memory usage history of localhost?

Re: Nagios XI Lockup Issues

Posted: Wed Dec 08, 2010 11:15 am
by tonyyarusso
I've upgraded the memory from 1.5GB to 3GB and all seems good.
Excellent.
Is there a way of observing the memory usage history of localhost?
Yes, but for whatever reason it's not available by default. What you'll need to do is add a plugin (I've attached the one I like), and create a service on localhost for checking RAM usage. Then you'll get the usual graphs alongside everything else. The thing I like about this particular plugin is it lets you specify that caches should be treated as free rather than used, which is usually what you want for tracking this kind of problem. For instance, compare the "normal" output on top compared to the corrected output on bottom:

Code: Select all

root@li133-235:/usr/lib/nagios/plugins# ./check_mem.pl -u -w 80 -c 90
CRITICAL - 96.2% (478 MB) used!|USED=478MB;397;447;0;497 FREE=19MB;;;0;497 CACHES=295MB;;;0;497
root@li133-235:/usr/lib/nagios/plugins# ./check_mem.pl -u -C -w 80 -c 90
OK - 36.7% (183 MB) used.|USED=183MB;398;448;0;498 FREE=315MB;;;0;498 CACHES=295MB;;;0;498

Re: Nagios XI Lockup Issues

Posted: Wed Dec 08, 2010 11:45 pm
by Box293
Thanks Tony this is exactly what I was after.

I found this command to work the best for me:

Code: Select all

./check_mem.pl -f -w 20 -c 10