Page 1 of 1
High CPU load
Posted: Mon Oct 03, 2011 11:05 am
by cwscribner
Hi all.
I recently upgraded to R1.7 and have noticed a severely high CPU load on a regular basis. So high in fact that npcd shuts down frequently. I guess technically there are two issues I need addressed here: High CPU load and how do I keep npcd running?
Note: I've heard from a client that went through Nagios training that using an older linux kernel can fix the high CPU load. Thoughts?
Re: High CPU load
Posted: Mon Oct 03, 2011 3:14 pm
by lmiltchev
high cpu load issue:
I never heard that "an older linux kernel can fix the high CPU load" but it is possible. I will talk to our developers to see if they know more about this issue.
I've heard of people solving the high CPU load problem on large nagios installs by adding a second CPU to their master server, but I suppose this is not going to be a solution for everyone, besides adding more hardware can get you only so far.
how to keep npcd running:
This is actually related to the first (cpu load) issue. If you fixed your cpu load, so that it would not exceed the value of the "load_threshold" in the "npcd.cfg" file, then your npcd would not (*should not*) automatically shut off. You can try experimenting by changing the default value "load_threshold = 10.0" in your "npcd.cfg" file ( in "/usr/local/nagios/etc/pnp" directory). You can also create a custom script to check if npcd is running and start it if necessary, and add it as a cron job. I am not sure if this is something you would like to do.
Re: High CPU load
Posted: Tue Oct 04, 2011 8:00 am
by cwscribner
There seems to be a noticeable spike in CPU load when I start npcd from a stopped state. Most recently when I started npcd, the CPU load went from ~.9 to ~8.5. A obviously huge jump. When npcd is stopped, the CPU load is calm. Any ideas on that correlation?
Re: High CPU load
Posted: Tue Oct 04, 2011 9:59 am
by mguthrie
My guess is that when npcd is stopped, you've got a large amount of perfdata results that are waiting to be processed, so the spike is the npcd daemon trying to get caught up and clear the result queue after being restarted.
I should note that npcd will stop processesing performance data if the CPU load hits 10.0. This setting can be modified in the /usr/local/nagios/etc/pnp/npcd.cfg file with the "load_threshold" setting.
Re: High CPU load
Posted: Tue Oct 04, 2011 10:11 am
by cwscribner
I increased the load_threshold and it seems to have demonstrated the behavior you've mentioned. It spiked pretty high but remained running with the increased parameter. I think this could be deemed as solved, although I'd like a more permanent fix for it instead of just allowing the CPU to max out.
Re: High CPU load
Posted: Tue Oct 04, 2011 10:36 am
by mguthrie
We are looking into other methods for maximizing performance, but so far we've documented all of the methods we've tested. We may send you some other tweaks to test out once we get them tested and documented.