Page 2 of 2

Re: Blank space in graphs

Posted: Fri Feb 03, 2012 11:24 am
by cwscribner
Got it. So basically I should leave everything to the default...

Any thoughts on whether or not this would be considered solved?

Re: Blank space in graphs

Posted: Fri Feb 03, 2012 11:41 am
by scottwilkerson
I would suggest trying reducing the sleep time to 10.

As far as resolved, I think it will only really be resolved once you get your new hardware ;)

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 9:44 am
by cwscribner
Just wanted to follow up on this.

We added a second 8 core CPU for a total of 16 cores. The loads hover at about half of what they did previously; around 7-10 now. Unfortunately, I'm still seeing timeouts a lot of errors in npcd.log and many graphs with blank spots.

The most recent 500 lines look like this...

Code: Select all

[02-23-2012 09:40:49] NPCD: ERROR: Executed command exits with return code '7' [02-23-2012 09:40:49] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1330008018'

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 10:16 am
by scottwilkerson
The return code 7 is a timeout, what is the TIMEOUT set to in
/usr/local/nagios/etc/pnp/process_perfdata.cfg ?

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 11:38 am
by cwscribner
5

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 11:54 am
by scottwilkerson
That should be long enough...

How many files do we have in the folder?

Code: Select all

ls /usr/local/nagios/var/spool/perfdata | wc -l

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 12:08 pm
by cwscribner
497089

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 5:08 pm
by mguthrie
Hmm, those files are supposed to be reaped and cleaned up every few seconds. From here you have two options, you can try increasing the timeout quite a bit to give each thread more time to scan through the directory and see if it can slowly get that file count back down to normal, or you can just clear that directory and watch to verify that it's cleaning up the files on a regular basis. There shouldn't be more than a handful of files in there at a time, and they should be getting deleted every 15 seconds after they've been processed.

[EDIT]
I did just check the PNP changelog and it looks like in the latest version they increased the process_perfdata timeout to 15 seconds as the default value instead of 5. I'd suggest trying that on your system since it's a larger install.

Re: Blank space in graphs

Posted: Thu Feb 23, 2012 6:44 pm
by cwscribner
I deleted all of the files, and now when I check the file count it fluctuates between 0-4 files; so it is indeed reaping files. I also increased the timeout to 15. I'll update in a few days after its had some chance to grab data.

Re: Blank space in graphs

Posted: Fri Feb 24, 2012 10:13 am
by mguthrie
Thanks for the update. Let us know how it goes, we've had this happen on a few larger installs so it'd be good to know how we can prevent this for users in the future.