Page 1 of 2

Perfdata graphs empty

Posted: Tue Oct 13, 2015 3:12 am
by vhoover
This is a important production issue. The performance data graphs are empty even though performance data is being collected and populated. I have tried suggestions in every forum post I could find regarding this kind of issue and have found no such luck. Please help.
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/perfdata.log
2015-09-11 21:52:03 [1789] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-09-11 21:52:03 [1788] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/service-perfdata.1442029862-PID-1788 deleted
2015-09-11 21:52:03 [1789] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/service-perfdata.1442029860-PID-1789 deleted
2015-09-11 21:52:03 [1788] [0] *** process_perfdata.pl terminated on signal ALRM
2015-09-11 21:52:03 [1789] [0] *** process_perfdata.pl terminated on signal ALRM
2015-09-11 21:52:03 [1785] [0] *** TIMEOUT: Timeout after 40 Sec. ****
2015-09-11 21:52:03 [1785] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-09-11 21:52:03 [1785] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-09-11 21:52:03 [1785] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/host-perfdata.1442029860-PID-1785 deleted
2015-09-11 21:52:03 [1785] [0] *** process_perfdata.pl terminated on signal ALRM
2015-09-11 21:52:03 [1786] [0] *** TIMEOUT: Timeout after 40 Sec. ****
2015-09-11 21:52:03 [1786] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-09-11 21:52:03 [1786] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-09-11 21:52:03 [1786] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/host-perfdata.1442029862-PID-1786 deleted
2015-09-11 21:52:03 [1786] [0] *** process_perfdata.pl terminated on signal ALRM
2015-09-11 21:55:18 [16841] [0] *** TIMEOUT: Timeout after 40 Sec. ****
2015-09-11 21:55:18 [16841] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-09-11 21:55:18 [16841] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-09-11 21:55:18 [16841] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/service-perfdata.1442030059-PID-16841 deleted
2015-09-11 21:55:18 [16841] [0] *** process_perfdata.pl terminated on signal ALRM
2015-09-11 21:58:05 [21682] [0] *** TIMEOUT: Timeout after 40 Sec. ****
2015-09-11 21:58:05 [21682] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-09-11 21:58:05 [21682] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-09-11 21:58:05 [21682] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata/service-perfdata.1442030123-PID-21682 deleted
2015-09-11 21:58:05 [21682] [0] *** process_perfdata.pl terminated on signal ALRM
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/npcd.log
[09-11-2015 21:52:03] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata/host-perfdata.1442029862'
[09-11-2015 21:55:18] NPCD: ERROR: Executed command exits with return code '1'
[09-11-2015 21:55:18] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata/service-perfdata.1442030059'
[09-11-2015 21:58:05] NPCD: ERROR: Executed command exits with return code '1'
[09-11-2015 21:58:05] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata/service-perfdata.1442030123'
[09-11-2015 21:58:20] NPCD: WARN: MAX load reached: load 33.140000/20.000000 at i=0
[09-11-2015 21:58:35] NPCD: WARN: MAX load reached: load 27.330000/20.000000 at i=1
[09-11-2015 21:59:38] NPCD: WARN: MAX load reached: load 38.010000/20.000000 at i=1
[09-11-2015 21:59:53] NPCD: WARN: MAX load reached: load 30.580000/20.000000 at i=1
[09-11-2015 22:00:08] NPCD: WARN: MAX load reached: load 27.250000/20.000000 at i=1
[09-25-2015 03:14:32] NPCD: Caught Termination Signal - Hasta la vista... baby
[09-25-2015 04:05:42] NPCD: npcd Daemon (0.4.14) started with PID=23254
[09-25-2015 04:05:42] NPCD: Please have a look at 'npcd -V' to get license information
[09-25-2015 04:05:42] NPCD: HINT: load_threshold is enabled - ('20.000000')
[09-29-2015 04:56:47] NPCD: Caught Termination Signal - Hasta la vista... baby
[09-29-2015 04:56:47] NPCD: npcd Daemon (0.4.14) started with PID=31396
[09-29-2015 04:56:47] NPCD: Please have a look at 'npcd -V' to get license information
[09-29-2015 04:56:47] NPCD: HINT: load_threshold is enabled - ('40.000000')
[10-01-2015 00:02:00] NPCD: npcd Daemon (0.4.14) started with PID=1667
[10-01-2015 00:02:00] NPCD: Please have a look at 'npcd -V' to get license information
[10-01-2015 00:02:00] NPCD: HINT: load_threshold is enabled - ('40.000000')
[10-11-2015 22:04:47] NPCD: Caught Termination Signal - Hasta la vista... baby
[10-11-2015 22:56:05] NPCD: npcd Daemon (0.4.14) started with PID=12628
[10-11-2015 22:56:05] NPCD: Please have a look at 'npcd -V' to get license information
[10-11-2015 22:56:05] NPCD: HINT: load_threshold is enabled - ('40.000000')

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 12:43 pm
by tgriep
Can you run the following on your Nagios system to see if the performance files are spooling and that could be the cause of the issue? Please post the output.

Code: Select all

ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 3:14 pm
by vhoover
Looks like they are not spooling.
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/perfdata | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/checkresults | wc -l
0

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 4:50 pm
by ssax
Are you seeing anything in your /var/log/cron?

Is it only one check where this is happening or all of them?

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 4:52 pm
by vhoover
All of them

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 6:12 pm
by SteveBeauchemin
I believe I have seen similar errors in my log file in the past.

The timeout in my process_perfdata.cfg file needed to be longer. Yours does too.
The file to edit is /usr/local/nagios/etc/pnp/process_perfdata.cfg

At my site, I increased the timeout. It is now set to 60.
I see from your log that your timeout is set to 40.
The system is throwing away your data because it takes
longer than 40 seconds to process your files.

This may not be your complete answer, there could be more to it.
But your log is saying timeout, and shows the file delete before it is processed.
That's pretty clear.

Next log file...

The npcd log file shows that it wants to process files,
but they were deleted before it could get to them.

I would make changes to the npcd.cfg file in that same directory as the other config file.
I would increase the number of npcd_max_threads. I have mine set to 15.
Also, I decreased the sleep_time to 6

I am not suggesting that you should use those numbers. I worked at this until
my settings were right for my site. Those numbers are where I ended up after trial and error.

Try making changes to those number slowly. Increase threads, reduce sleep.
Use "service npcd restart" after each change. Wait and see if the system starts working better.

The Timeout set to 60 should make the most difference, but npcd needs more
parallel processes so it can get the job done faster.

One last thought. Have you considered setting up a ram disk for these files?
You will still need the changes I suggested ram disk or no ram disk.
It is much easier to setup than I thought it would be. Nagios has
instructions in pdf somewhere. If you do use the ram disk... You just need
to keep an eye on the space used and make sure you know early before it fills
up if there is a problem. I have mine set to 500MB at this time. I have noticed it
filling up a couple times. One time I needed to restart npcd. Once I needed to
restart nagios. Lessons learned...

Good Luck.

Steve B

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 7:21 pm
by Box293
SteveBeauchemin wrote:One last thought. Have you considered setting up a ram disk for these files?
You will still need the changes I suggested ram disk or no ram disk.
It is much easier to setup than I thought it would be. Nagios has
instructions in pdf somewhere. If you do use the ram disk... You just need
to keep an eye on the space used and make sure you know early before it fills
up if there is a problem. I have mine set to 500MB at this time. I have noticed it
filling up a couple times. One time I needed to restart npcd. Once I needed to
restart nagios. Lessons learned...
Great advice @SteveBeauchemin, the RAM Disk is an invaluable performance enhancement for Nagios XI. Here is the official procedure for it:
https://assets.nagios.com/downloads/nag ... giosXI.pdf

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 7:57 pm
by vhoover
I have a ram disk setup and I have changed the perfdata timeout but have yet to have the graphs populate.

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 8:03 pm
by Box293
Lets increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:

http://support.nagios.com/wiki/index.ph ... leshooting

Wait 15 - 20 minutes and then get a tail of the logs:

Code: Select all

tail -250 /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -250 /usr/local/nagios/var/npcd.log > /tmp/npcd.txt
Send us a copy of /tmp/perfdata.txt and /tmp/npcd.txt

Don't forget to turn down the log level as per the FAQ when you are done!

Re: Perfdata graphs empty

Posted: Tue Oct 13, 2015 8:20 pm
by vhoover
Here are the files.