Virtual Nagios and high disk rate

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
TenTimes
Posts: 40
Joined: Fri Oct 14, 2011 9:54 am

Virtual Nagios and high disk rate

Post by TenTimes »

Hello everyone,

I have a Nagios XI server running on a vCenter 4.1 environment and the disk rate seems to keep changing between a steady 200-500 KB/s and a steady 3000 KB/s. There is no reason it should do this because it still has the same workload as before. We hav eall the results stored in a RAMdisk because we don't need the results to be saved.

The VM is a Nagios virual appliance running CentOS 6 (32bit) with NagiosXI 2011R2.2. We have 2 CPU cores reserved for this VM and 3 GB of RAM. The VM checks 70 hosts and about 3000 services.
The storage is a sata SAN (Netapp filer). There is no distributed monitoring or offloading of the SQL database. All logging is turned off because this generated a lot of IOs. The only real add-on we have added is the vSphere Perl SDK for the ESX/vCenter monitoring.

Is this an issue which happens often in Nagios XI and does it always go back to normal (this happens sometimes)?

(By the way; does posting on the customer forum count as a call included with the Nagios XI license?)
Last edited by TenTimes on Tue Apr 17, 2012 3:28 am, edited 1 time in total.
Chasse Theater, Breda, NL
Image
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Virual Nagios and high disk rate

Post by scottwilkerson »

One our resident developers put together a post a short while back about techniques to increase performance, specifically regarding disk IO.

Instead of copying all the information I am just going to provide a link here
http://labs.nagios.com/2012/01/30/nagio ... g-disk-io/
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
TenTimes
Posts: 40
Joined: Fri Oct 14, 2011 9:54 am

Re: Virual Nagios and high disk rate

Post by TenTimes »

Thanks for the update, but I already used the offloading to a RAM disk and the rrdcached function isn't an option for us. That was after reading the article when it came out :D .

So the only way to lower this disk I/O is through offloading the SQL DB?

The strange thing is that it has a read/write of about 200 KBps (with some peaks up to 700KBps) after booting and it stays that way for a few days and then it suddenly rises to about 2500 KBps and stays that way.
Chasse Theater, Breda, NL
Image
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Virual Nagios and high disk rate

Post by scottwilkerson »

What version of XI are you running?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Virual Nagios and high disk rate

Post by mguthrie »

As of last week we made some updates to the following docs:

http://assets.nagios.com/downloads/nagi ... rmance.pdf
http://assets.nagios.com/downloads/nagi ... giosXI.pdf

The RAM disk document outlines how to move check results and performance data to a RAM disk as well to create a huge reduction in disk I\O. We recommend testing first in a dev environment.

We also committed some changes for our upcoming 2.3 release that will help reduce CPU and Disk I\O spikes. We found that when nagios detected a lot of problem around the same time, the event handlers and notifications created a noticeable increase in CPU and disk activity. Our in-house testing shows that our latest changes should help that. These new changes are documented in the Maximizing XI Performance document, under "subsystem."
TenTimes
Posts: 40
Joined: Fri Oct 14, 2011 9:54 am

Re: Virual Nagios and high disk rate

Post by TenTimes »

Thanks for the info, I will try the update when you release it.

By the way: We use ESX 4.1 if that has any importance.

A lot of the tips in the maximizing perforce doc are already implemented. I also have things like logging already turned way down.
I think the optimisation which reduces the I/O spikes will solve this problem, because that is wat I keep seeing. The disk rate (when viewing realtime performance data) is stable at the bottom with about 200 KB/s but has spikes which peak at an average of 3000 KB/s and some peaks even go up to 8000KB/s. These peaks return every minute and then the disk rate settles back to the bottom value.

I will let you know if the update has any effect when it's released.
Chasse Theater, Breda, NL
Image
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Virual Nagios and high disk rate

Post by mguthrie »

Yeah, if the increases are at the top of each minute, then the issue is most likely the subsystem cron jobs, which launch once per minute. There are probably some other system level causes for that spike as well, but the 2.3 updates will hopefully help some of those.
TenTimes
Posts: 40
Joined: Fri Oct 14, 2011 9:54 am

Re: Virual Nagios and high disk rate

Post by TenTimes »

Could a part of the problem also be the Apache service? Because the Nagios server consumes more and more RAM as time goes by. It starts out at 20-25% usage and after a week without a restart the RAM usage has grwon to 40-50% usage.
Chasse Theater, Breda, NL
Image
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Virual Nagios and high disk rate

Post by mguthrie »

I don't suspect that they're related, although mysql and apache are both memory heavy, and they like to cache available memory. Apache can and does slowly leak memory over time, but it can be easily remedied by restarting it once per day on a cron job.
TenTimes
Posts: 40
Joined: Fri Oct 14, 2011 9:54 am

Re: Virual Nagios and high disk rate

Post by TenTimes »

Thanks for the advice. This way I might just be able to keep the memory usage and the disk rate down.
I'll try to restart just the mysql service when the discrate is high and see if that helps. If that works, I'll add 2 cronjobs to restart the mysql and apache service every hour. That should keep the growing of resource usage to a minimal.

If this doesn't work, I'll try to do it with a trigger in ESX. If the usage hits a certain treshold, the machine will be restarted and the usage is back to normal. Then I just have to add a rule which automaticly restarts the Nagios service so all the service will be monitored.
Chasse Theater, Breda, NL
Image
Locked