Hi all.
I'm having minor trouble with the performance grapher (npcd?) randomly stopping. It doesn't happen very frequently but I would think that it shouldn't happen at all. Or if it does, it should automatically be restarted. Can I get some help/input on this? Either how to auto-restart it or figure out why its stopping?
Performance grapher randomly stops
-
cwscribner
- Posts: 316
- Joined: Thu Mar 31, 2011 9:54 am
- Location: Patten, ME
- Contact:
Performance grapher randomly stops
Last edited by cwscribner on Mon Oct 31, 2011 4:58 pm, edited 1 time in total.
Re: Performance grapher randomly stops
cwscribner,
We will have to do some testing and will get back to you as soon as we can.
We will have to do some testing and will get back to you as soon as we can.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
cwscribner
- Posts: 316
- Joined: Thu Mar 31, 2011 9:54 am
- Location: Patten, ME
- Contact:
Re: Performance grapher randomly stops
Okay. Is there any kind of output I can provide you with or some obvious settings that I can change? Maybe in npcd.cfg?
Re: Performance grapher randomly stops
cwscribner,
The biggest cause of this issue is that npcd automatically shuts off when CPU load gets to 10%. This is usually indicated in the nagios/var/npcd.log and nagios/var/perfdata.log. The value is called load_threshold in the npcd.cfg in the nagios/etc/pnp directory. Increasing the value may remedy your problems, but it is an issue that snowballs.
The biggest cause of this issue is that npcd automatically shuts off when CPU load gets to 10%. This is usually indicated in the nagios/var/npcd.log and nagios/var/perfdata.log. The value is called load_threshold in the npcd.cfg in the nagios/etc/pnp directory. Increasing the value may remedy your problems, but it is an issue that snowballs.
Nicholas Scott
Former Nagios employee
Former Nagios employee
-
cwscribner
- Posts: 316
- Joined: Thu Mar 31, 2011 9:54 am
- Location: Patten, ME
- Contact:
Re: Performance grapher randomly stops
So what would be a more permanent remedy to the problem?
Re: Performance grapher randomly stops
That will prevent the performance data processing from stopping, but it won't resolve the underlying issue of high CPU load. After some discussions with users with large environments at the conference, here are some other suggestions for decreasing CPU load on your system.I should note that npcd will stop processesing performance data if the CPU load hits 10.0. This setting can be modified in the /usr/local/nagios/etc/pnp/npcd.cfg file with the "load_threshold" setting.
- Spread checks out whenever possible, the default check_interval is 5mn, use a higher number whenever you can.
- SNMP monitoring is a huge CPU grab, use passive checks when possible, and use binary check plugins instead of scripted ones whenever possible.
You've probably looked at this already, but just in case:
http://assets.nagios.com/downloads/nagi ... p#boosting
We're also looking into some other system tweaks that might improve performance on an XI install.
-
cwscribner
- Posts: 316
- Joined: Thu Mar 31, 2011 9:54 am
- Location: Patten, ME
- Contact:
Re: Performance grapher randomly stops
Is there already a Nagios sanctioned script in existence for auto-restarting a service (npcd)? I've looked around online and tried various solutions but none seem to work.
Re: Performance grapher randomly stops
There isn't currently, but I think the main issue is that the npcd daemon keeps turning off, which it shouldn't be doing. Do you have any interesting info in your /usr/local/nagios/var/npcd.log file that might give any clues as to why it keeps stopping?
-
cwscribner
- Posts: 316
- Joined: Thu Mar 31, 2011 9:54 am
- Location: Patten, ME
- Contact:
Re: Performance grapher randomly stops
Here's the last 500 lines of the log file. Nothing really jumps out at me but I also don't know much about npcd.
You do not have the required permissions to view the files attached to this post.
Re: Performance grapher randomly stops
Yeah I'm not seeing anything obvious in the log at the moment. If it randomly shuts off again, can you capture the last 500 lines again and send it our way.
In the meantime, you can set up a check against the npcd process (I think some form of check_procs might work) and define a new event handler that just restarts the service.
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html
In the meantime, you can set up a check against the npcd process (I think some form of check_procs might work) and define a new event handler that just restarts the service.
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html