Performance grapher randomly stops

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Performance grapher randomly stops

Post by cwscribner »

Hi all.

I'm having minor trouble with the performance grapher (npcd?) randomly stopping. It doesn't happen very frequently but I would think that it shouldn't happen at all. Or if it does, it should automatically be restarted. Can I get some help/input on this? Either how to auto-restart it or figure out why its stopping?
Last edited by cwscribner on Mon Oct 31, 2011 4:58 pm, edited 1 time in total.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance grapher randomly stops

Post by lmiltchev »

cwscribner,

We will have to do some testing and will get back to you as soon as we can.
Be sure to check out our Knowledgebase for helpful articles and solutions!
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Performance grapher randomly stops

Post by cwscribner »

Okay. Is there any kind of output I can provide you with or some obvious settings that I can change? Maybe in npcd.cfg?
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Performance grapher randomly stops

Post by nscott »

cwscribner,

The biggest cause of this issue is that npcd automatically shuts off when CPU load gets to 10%. This is usually indicated in the nagios/var/npcd.log and nagios/var/perfdata.log. The value is called load_threshold in the npcd.cfg in the nagios/etc/pnp directory. Increasing the value may remedy your problems, but it is an issue that snowballs.
Nicholas Scott
Former Nagios employee
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Performance grapher randomly stops

Post by cwscribner »

So what would be a more permanent remedy to the problem?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance grapher randomly stops

Post by mguthrie »

I should note that npcd will stop processesing performance data if the CPU load hits 10.0. This setting can be modified in the /usr/local/nagios/etc/pnp/npcd.cfg file with the "load_threshold" setting.
That will prevent the performance data processing from stopping, but it won't resolve the underlying issue of high CPU load. After some discussions with users with large environments at the conference, here are some other suggestions for decreasing CPU load on your system.

- Spread checks out whenever possible, the default check_interval is 5mn, use a higher number whenever you can.
- SNMP monitoring is a huge CPU grab, use passive checks when possible, and use binary check plugins instead of scripted ones whenever possible.

You've probably looked at this already, but just in case:
http://assets.nagios.com/downloads/nagi ... p#boosting

We're also looking into some other system tweaks that might improve performance on an XI install.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Performance grapher randomly stops

Post by cwscribner »

Is there already a Nagios sanctioned script in existence for auto-restarting a service (npcd)? I've looked around online and tried various solutions but none seem to work.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance grapher randomly stops

Post by mguthrie »

There isn't currently, but I think the main issue is that the npcd daemon keeps turning off, which it shouldn't be doing. Do you have any interesting info in your /usr/local/nagios/var/npcd.log file that might give any clues as to why it keeps stopping?
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Performance grapher randomly stops

Post by cwscribner »

Here's the last 500 lines of the log file. Nothing really jumps out at me but I also don't know much about npcd.
You do not have the required permissions to view the files attached to this post.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance grapher randomly stops

Post by mguthrie »

Yeah I'm not seeing anything obvious in the log at the moment. If it randomly shuts off again, can you capture the last 500 lines again and send it our way.

In the meantime, you can set up a check against the npcd process (I think some form of check_procs might work) and define a new event handler that just restarts the service.
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html
Locked