Page 1 of 2

Error in perfdata.log

Posted: Sun Jun 08, 2014 8:20 pm
by rajasegar
Nagios XI 2012R2.9
RHEL 6.5 x64
Manual Install
Firefox 23

Please advice on how to fix this warning.
[06-08-2014 00:12:21] NPCD: WARN: MAX load reached: load 25.390000/10.000000 at i=1
[06-08-2014 00:12:36] NPCD: WARN: MAX load reached: load 19.840000/10.000000 at i=1
[06-08-2014 00:12:51] NPCD: WARN: MAX load reached: load 15.520000/10.000000 at i=1
[06-08-2014 00:13:06] NPCD: WARN: MAX load reached: load 12.310000/10.000000 at i=1
npcd.zip

Re: Error in perfdata.log

Posted: Mon Jun 09, 2014 9:23 am
by tmcdonald
That error simply means that your load has gotten too high and as a result npcd has stopped processing data.

How many cores do you have? A good rule of thumb is to take the number of cores you have (really the number of threads) and multiply that by 10, then enter that value in:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
for the "load_threshold" entry. Then restart npcd and it should stop displaying that error. However it would be preferable to cut down the load in the first place if possible.

Re: Error in perfdata.log

Posted: Mon Jun 09, 2014 6:46 pm
by rajasegar
tmcdonald wrote:That error simply means that your load has gotten too high and as a result npcd has stopped processing data.

How many cores do you have? A good rule of thumb is to take the number of cores you have (really the number of threads) and multiply that by 10, then enter that value in:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
for the "load_threshold" entry. Then restart npcd and it should stop displaying that error. However it would be preferable to cut down the load in the first place if possible.
I have 8 cores, single thread VM. CPU utilization is almost always below 80%. It occasionally hovers momentarily around 50 - 60%.
Anyway changed load_threshold = 80.0 as recommended.

Re: Error in perfdata.log

Posted: Mon Jun 09, 2014 10:57 pm
by Box293
There are a few things that can cause the load to get high.

One of these is when all the service checks run on the same interval (5 minutes for example). Every five minutes the Nagios XI host gets pretty busy.

If you haven't already done so, I suggest looking at the different service checks you have and justify the check intervals. For example disk space might only need to be checked evey 60 minutes. Also, instread of checking every 60 minutes, try 58 or 62 minutes. This just spreads the load out more.

Re: Error in perfdata.log

Posted: Tue Jun 10, 2014 12:56 am
by rajasegar
Box293 wrote:There are a few things that can cause the load to get high.

One of these is when all the service checks run on the same interval (5 minutes for example). Every five minutes the Nagios XI host gets pretty busy.

If you haven't already done so, I suggest looking at the different service checks you have and justify the check intervals. For example disk space might only need to be checked evey 60 minutes. Also, instread of checking every 60 minutes, try 58 or 62 minutes. This just spreads the load out more.
I wish I could do that. Mine is almost all in 5 minute intervals. Even this they say is too long. :x
For those services check using Java especially for MQ etc, this was a big problem.
I solved it by using service dependencies. This makes the services check sequential. CPU usage on the client dropped from 90% to around 15%.

Re: Error in perfdata.log

Posted: Tue Jun 10, 2014 6:33 am
by rajasegar
After making the changes, the load reduced considerably
In fact it reduced by half compared to before.
CPU usage does not seem to change much.
10-06-2014 07-30-30 PM.png
How do I find out where these errors are coming from?

[06-10-2014 09:21:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402363297.perfdata.service'
[06-10-2014 10:38:29] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 10:38:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402367882.perfdata.host'
[06-10-2014 10:38:29] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 10:38:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402367882.perfdata.service'
[06-10-2014 11:13:27] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 11:13:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402369982.perfdata.service'
[06-10-2014 11:13:27] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 11:13:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402369997.perfdata.service'

Re: Error in perfdata.log

Posted: Tue Jun 10, 2014 10:49 am
by abrist
Enable perfdata debug logging as specified in the FAQ:
http://support.nagios.com/wiki/index.ph ... leshooting
And then wait 5 minutes and post a tail of perfdata.log:

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log

Re: Error in perfdata.log

Posted: Tue Jun 10, 2014 6:44 pm
by rajasegar
abrist wrote:Enable perfdata debug logging as specified in the FAQ:
http://support.nagios.com/wiki/index.ph ... leshooting
And then wait 5 minutes and post a tail of perfdata.log:

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log
Enabled the logging. So far have not seen any errors. Will post the error comes up.

Re: Error in perfdata.log

Posted: Wed Jun 11, 2014 10:47 am
by abrist
Great, let us know if it recurs.

Re: Error in perfdata.log

Posted: Wed Jun 11, 2014 6:43 pm
by rajasegar
abrist wrote:Great, let us know if it recurs.
Found out the error is due to timeout.
Increased the timeout to 25 and did not see the errors anymore