Page 1 of 2

Performance grapher randomly stops

Posted: Wed Sep 28, 2011 3:22 pm
by cwscribner
Hi all.

I'm having minor trouble with the performance grapher (npcd?) randomly stopping. It doesn't happen very frequently but I would think that it shouldn't happen at all. Or if it does, it should automatically be restarted. Can I get some help/input on this? Either how to auto-restart it or figure out why its stopping?

Re: Performance grapher randomly stops

Posted: Thu Sep 29, 2011 5:34 pm
by lmiltchev
cwscribner,

We will have to do some testing and will get back to you as soon as we can.

Re: Performance grapher randomly stops

Posted: Fri Sep 30, 2011 9:00 am
by cwscribner
Okay. Is there any kind of output I can provide you with or some obvious settings that I can change? Maybe in npcd.cfg?

Re: Performance grapher randomly stops

Posted: Fri Sep 30, 2011 9:53 am
by nscott
cwscribner,

The biggest cause of this issue is that npcd automatically shuts off when CPU load gets to 10%. This is usually indicated in the nagios/var/npcd.log and nagios/var/perfdata.log. The value is called load_threshold in the npcd.cfg in the nagios/etc/pnp directory. Increasing the value may remedy your problems, but it is an issue that snowballs.

Re: Performance grapher randomly stops

Posted: Fri Sep 30, 2011 10:46 am
by cwscribner
So what would be a more permanent remedy to the problem?

Re: Performance grapher randomly stops

Posted: Tue Oct 04, 2011 10:13 am
by mguthrie
I should note that npcd will stop processesing performance data if the CPU load hits 10.0. This setting can be modified in the /usr/local/nagios/etc/pnp/npcd.cfg file with the "load_threshold" setting.
That will prevent the performance data processing from stopping, but it won't resolve the underlying issue of high CPU load. After some discussions with users with large environments at the conference, here are some other suggestions for decreasing CPU load on your system.

- Spread checks out whenever possible, the default check_interval is 5mn, use a higher number whenever you can.
- SNMP monitoring is a huge CPU grab, use passive checks when possible, and use binary check plugins instead of scripted ones whenever possible.

You've probably looked at this already, but just in case:
http://assets.nagios.com/downloads/nagi ... p#boosting

We're also looking into some other system tweaks that might improve performance on an XI install.

Re: Performance grapher randomly stops

Posted: Thu Oct 06, 2011 2:52 pm
by cwscribner
Is there already a Nagios sanctioned script in existence for auto-restarting a service (npcd)? I've looked around online and tried various solutions but none seem to work.

Re: Performance grapher randomly stops

Posted: Thu Oct 06, 2011 3:21 pm
by mguthrie
There isn't currently, but I think the main issue is that the npcd daemon keeps turning off, which it shouldn't be doing. Do you have any interesting info in your /usr/local/nagios/var/npcd.log file that might give any clues as to why it keeps stopping?

Re: Performance grapher randomly stops

Posted: Thu Oct 06, 2011 3:40 pm
by cwscribner
Here's the last 500 lines of the log file. Nothing really jumps out at me but I also don't know much about npcd.

Re: Performance grapher randomly stops

Posted: Fri Oct 07, 2011 9:06 am
by mguthrie
Yeah I'm not seeing anything obvious in the log at the moment. If it randomly shuts off again, can you capture the last 500 lines again and send it our way.

In the meantime, you can set up a check against the npcd process (I think some form of check_procs might work) and define a new event handler that just restarts the service.
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html