Page 1 of 1

Nagios Server Performance Issues

Posted: Thu May 08, 2014 10:09 pm
by Fred Kroeger
Yes... it's the old Nagios Server Performance issues post again......
I've implemented all the good things like RAMDisk, folowed all the performance tuning tips, etc. and all has been working really well.
However, recently, we've had excessive CPU utilisation on a Nagios VM.
It also appears that this is related to when the Nagios service is restarted after a config change.
I've attached a graph showing the CPU Utilisation . At about 10am yesterday, Nagios was restarted. CPU "User" shoots up & CPU Idle goes to almost zero.
I restarted Nagios again at 08:00 this morning and User goes down to its normal level and Idle increases to normal.

Looking at the process stats on the server, I can't find any process that is using this extra CPU which is really frustrating. I thought perhaps that the MySQL database was responsible, but of course that doesn't get restarted when the Nagios service is restarted. I even ran a DB repair but it made no difference.

I know that this is difficult for you to fault find especially as it isn't always present. I guess what I'm asking for is any clues to look for or for tips if anyone has had similar issues.

I am running NagiosXI 2012R2.8c with 280 Hosts and 2,300 Services

regards... Fred

Re: Nagios Server Performance Issues

Posted: Fri May 09, 2014 9:03 am
by tmcdonald
I'm not seeing the attachment, but I would take a look at the type of checks you are running. ESX and WMI checks can be hogs, and check_by_ssh doesn't always play nice either.

Re: Nagios Server Performance Issues

Posted: Sun May 11, 2014 9:56 pm
by Fred Kroeger
Sorry about the attachment - just tried to upload it again & discovered that I can't upload pdf files.
Yes I'm already "renicing" any CPU hog - but the problem is that when this issue starts, I can't identify any particular process that could be responsible.
I would also expect that this issue would always be consistent as the same monitors are always being scheduled. So restarting the Nagios service shouldn't change any of the monitors.
As you can see from the CPU graph it is so obvious when the Nagios service gets restarted.

regards... Fred

Re: Nagios Server Performance Issues

Posted: Mon May 12, 2014 10:44 am
by sreinhardt
Would we be correct in understanding that the blue area on the left is a single restart, and the one(s) in the middle are from multiple restarts? To be fair, I would expect load increases for a little bit when nagios is restarted, as it has to recompile all of your configs, figure out templating and inheritance, which can both take a bit and take some resources. However I did want to start by confirming, that this is not happening for 6 hours straight due to one restart.

Re: Nagios Server Performance Issues

Posted: Mon May 19, 2014 10:02 pm
by Fred Kroeger
The start and the end of each of the "blue" sections corresponds to a single Nagios restart.

The left and right edges of the graph are what it looks like normally.
During the period of high usr CPU, there is no indication via top that a nagios processs is hogging all the recources. It was only when I applied a config change and then noticed that the utilisation had returned to normal that I started to suspect the nagios service. As I mentioned previously, I did try & restart the mysqld service (as it had used a large amount of CPU time ) but that didn't make any difference.

I appreciate that this may be impossible to diagnose as I haven't had this experience on any of the other 9 Nagios servers I'm running. One even is on the same ESX host and shares the same SAN. I'm just putting this "out there" in case anyone else has had a similar experience.

To me it would appear that there is some nagios process not completing or looping causing this continuous CPU utilisation - however it does not affect the monitoring as it still runs normally

regards... Fred

Re: Nagios Server Performance Issues

Posted: Tue May 20, 2014 9:23 am
by scottwilkerson
Fred Kroeger wrote:I appreciate that this may be impossible to diagnose as I haven't had this experience on any of the other 9 Nagios servers I'm running. One even is on the same ESX host and shares the same SAN. I'm just putting this "out there" in case anyone else has had a similar experience.
You certainly may be correct.

One thing I'll throw out there, is that I have seen this type of behavior if somehow a nagios process gets stuck running, and you end up with multiple processes running at the same time.

In this case the following usually fixes the problem by giving the process a little more time to exit on restart
http://support.nagios.com/wiki/index.ph ... ely_manner

Re: Nagios Server Performance Issues

Posted: Tue Jun 10, 2014 8:24 pm
by Fred Kroeger
well.... it turned out to be an issue with plugin and not the NAgios processes.
The CPU plugin reads the CPU values in proc/stat, waits for an interval, then rereads the proc/stat values.
Unfortunately the defaullt interval is 1 sec which really doesn't reflect any sort of average utilisation.
So changnig the default interval to a larger number now shows CPU Utilisation consistent with what I see using top, etc.

thanks for your assistance..... Fred