Page 1 of 2
I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 8:27 am
by jpipitone
We have a secondary NagiosXI instance in one of our east coast offices. This NagiosXI instance monitors our other NagiosXI instance on the west coast.
We have been noticing the west coast NagiosXI (the one with the i/o wait) has been reporting various sites and services as critical and / or flapping, when in fact everything is up, with no issue.
Are there any tweaks that we can make to improve performance, and cut down on the i/o waits?
Currently, NagiosXI is reporting the following of our primary NagiosXI instance:
Critical: I/O Wait = 75.15%
Load Critical: load1=10.92, load5=24.86, load15=22.92
NagiosXI Jobs: Error: Could not parse XML from
http://nagiosserver/nagiosxi ()
Any help would be appreciated.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 12:45 pm
by abrist
If the wait gets bad enough, checks will start to timeout - that is my guess as to why you are getting inconsistent false alerts.
1) What type of disk subsystem is the server using?
2) How many cores/threads and and much ram?
3) How many checks are you running/scheduling every 5 minutes?
This could be an unrelated issue, but we will have to get the resource usable under control before we can test this problem.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 12:46 pm
by tmcdonald
Our two big optimizations are implementing a ramdisk and offloading the DB:
http://assets.nagios.com/downloads/nagi ... giosXI.pdf
http://assets.nagios.com/downloads/nagi ... Server.pdf
Both will lower the IO, and the DB offloading also helps drop the CPU load a bit.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 1:38 pm
by jpipitone
Specs:
1 quad core Intel Xeon, 2.4ghz
4gb physical memory
3 x 7200 rpm disks (raid 1, 1 hotspare)
We have about 1022 service checks. We perform checks every minute. The checks are for various switches, servers, websites, a few DNS queries, etc.
This installation has been running fine for years, and then when we started replacing legacy Nagios Core checks with NagiosXI checks, we noticed the I/O issues.
I will start with configuring a RAM disk. With only 249mb free of 4096mb of physical memory, this may be a challenge.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 2:16 pm
by slansing
I would highly recommend bumping your memory up a bit, 249m is likely not a safe number for processing check results and performance data through a ramdisk on an installation with roughly 2000 host/service checks combined. You can give it a shot, but my recommendation stands

.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 2:43 pm
by jpipitone
slansing wrote:I would highly recommend bumping your memory up a bit, 249m is likely not a safe number for processing check results and performance data through a ramdisk on an installation with roughly 2000 host/service checks combined. You can give it a shot, but my recommendation stands

.
Thanks. Any recommendation on how large to make the RAM disk given we have 249mb free at this time? Looks like the PDF recommends 50mb? Also, this is a 32 bit operating system. If we added more physical mem, would NagiosXI even be able to utilize it if it's 32 bit?
The files added up together are about 40mb. I am going to start with 100 and see how it runs - unless you object.
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 4:50 pm
by scottwilkerson
Actually the PDF makes the following recommendation
Now we have to actually mount a RAM disk to that location. This is the point where we need to determine the size of the RAM disk that
will be set aside. This can be determined by taking a look at your current status.dat and objects.cache which default to being in the
/usr/local/nagios/var/
directory. On my test machine, these two files added up to be around 13MB, so I will set up the RAM disk to be
50MB to give some leeway and allow for growth. This will only make an improvement if you have enough available memory, otherwise
this will mount the RAM disk and use swap memory for excess RAM allocated
I would say at a minimum 4X the size of those files...
Re: I/O wait reported from secondary NagiosXI
Posted: Wed Apr 02, 2014 9:05 pm
by jpipitone
OK. After creating the ram disk, I'm seeing fewer I/O alerts, however NagiosXI is running super slow. The interface is basically non-responsive.
I can click on All Service Problems or All Host Problems, and they eventually display. If I click on Service Details, the page doesn't display. I don't even get a timeout.
If we added more physical memory to this server, will NagiosXI be able to take advantage of more than 4gb, considering the OS is only 32 bit?
Re: I/O wait reported from secondary NagiosXI
Posted: Thu Apr 03, 2014 7:48 am
by jpipitone
Just an update - this morning it seems to be running OK. I ran the database repair script. Still lagging quite a bit, but at least I can work in Nagios now
Re: I/O wait reported from secondary NagiosXI
Posted: Thu Apr 03, 2014 1:05 pm
by slansing
Lagging you say? What is the output of the following:
I'm wondering if something went wrong with the ramdisk creation and you are being strained for memory right now.