Error in perfdata.log

rajasegar · Post by **rajasegar** » Sun Jun 08, 2014 8:20 pm

Nagios XI 2012R2.9
RHEL 6.5 x64
Manual Install
Firefox 23

Please advice on how to fix this warning.
[06-08-2014 00:12:21] NPCD: WARN: MAX load reached: load 25.390000/10.000000 at i=1
[06-08-2014 00:12:36] NPCD: WARN: MAX load reached: load 19.840000/10.000000 at i=1
[06-08-2014 00:12:51] NPCD: WARN: MAX load reached: load 15.520000/10.000000 at i=1
[06-08-2014 00:13:06] NPCD: WARN: MAX load reached: load 12.310000/10.000000 at i=1

npcd.zip

tmcdonald · Post by **tmcdonald** » Mon Jun 09, 2014 9:23 am

That error simply means that your load has gotten too high and as a result npcd has stopped processing data.

How many cores do you have? A good rule of thumb is to take the number of cores you have (really the number of threads) and multiply that by 10, then enter that value in:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg

for the "load_threshold" entry. Then restart npcd and it should stop displaying that error. However it would be preferable to cut down the load in the first place if possible.

rajasegar · Post by **rajasegar** » Mon Jun 09, 2014 6:46 pm

tmcdonald wrote:That error simply means that your load has gotten too high and as a result npcd has stopped processing data.

How many cores do you have? A good rule of thumb is to take the number of cores you have (really the number of threads) and multiply that by 10, then enter that value in:
Code: Select all
/usr/local/nagios/etc/pnp/npcd.cfg
for the "load_threshold" entry. Then restart npcd and it should stop displaying that error. However it would be preferable to cut down the load in the first place if possible.

I have 8 cores, single thread VM. CPU utilization is almost always below 80%. It occasionally hovers momentarily around 50 - 60%.
Anyway changed load_threshold = 80.0 as recommended.

Post by **Box293** » Mon Jun 09, 2014 10:57 pm

There are a few things that can cause the load to get high.

One of these is when all the service checks run on the same interval (5 minutes for example). Every five minutes the Nagios XI host gets pretty busy.

If you haven't already done so, I suggest looking at the different service checks you have and justify the check intervals. For example disk space might only need to be checked evey 60 minutes. Also, instread of checking every 60 minutes, try 58 or 62 minutes. This just spreads the load out more.

rajasegar · Post by **rajasegar** » Tue Jun 10, 2014 12:56 am

Box293 wrote:There are a few things that can cause the load to get high.

One of these is when all the service checks run on the same interval (5 minutes for example). Every five minutes the Nagios XI host gets pretty busy.

If you haven't already done so, I suggest looking at the different service checks you have and justify the check intervals. For example disk space might only need to be checked evey 60 minutes. Also, instread of checking every 60 minutes, try 58 or 62 minutes. This just spreads the load out more.

I wish I could do that. Mine is almost all in 5 minute intervals. Even this they say is too long.

For those services check using Java especially for MQ etc, this was a big problem.
I solved it by using service dependencies. This makes the services check sequential. CPU usage on the client dropped from 90% to around 15%.

rajasegar · Post by **rajasegar** » Tue Jun 10, 2014 6:33 am

After making the changes, the load reduced considerably
In fact it reduced by half compared to before.
CPU usage does not seem to change much.

10-06-2014 07-30-30 PM.png

How do I find out where these errors are coming from?

[06-10-2014 09:21:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402363297.perfdata.service'
[06-10-2014 10:38:29] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 10:38:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402367882.perfdata.host'
[06-10-2014 10:38:29] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 10:38:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402367882.perfdata.service'
[06-10-2014 11:13:27] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 11:13:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402369982.perfdata.service'
[06-10-2014 11:13:27] NPCD: ERROR: Executed command exits with return code '7'
[06-10-2014 11:13:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1402369997.perfdata.service'

abrist · Post by **abrist** » Tue Jun 10, 2014 10:49 am

Enable perfdata debug logging as specified in the FAQ:
http://support.nagios.com/wiki/index.ph ... leshooting
And then wait 5 minutes and post a tail of perfdata.log:

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log

rajasegar · Post by **rajasegar** » Tue Jun 10, 2014 6:44 pm

abrist wrote:Enable perfdata debug logging as specified in the FAQ:
http://support.nagios.com/wiki/index.ph ... leshooting
And then wait 5 minutes and post a tail of perfdata.log:
Code: Select all
tail -50 /usr/local/nagios/var/perfdata.log

Enabled the logging. So far have not seen any errors. Will post the error comes up.

abrist · Post by **abrist** » Wed Jun 11, 2014 10:47 am

Great, let us know if it recurs.

rajasegar · Post by **rajasegar** » Wed Jun 11, 2014 6:43 pm

abrist wrote:Great, let us know if it recurs.

Found out the error is due to timeout.
Increased the timeout to 25 and did not see the errors anymore

Nagios Support Forum

Error in perfdata.log

Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log

Re: Error in perfdata.log