Page 1 of 4

Performance Graph broken

Posted: Thu Sep 12, 2013 9:58 am
by vmesquita
Hi,

Latelly performance graph is not working anymore. Every service with Graph shows a blank graphis, just the thresholds, like the attached file. Any ideas on how to fix this?

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 11:08 am
by lmiltchev
Have you tried following the steps, outlined on our wiki page?

http://support.nagios.com/wiki/index.ph ... h_Problems

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 11:31 am
by vmesquita
I haven't seen it before, but I just tried and it didn't fix. The data seems to be obtained, but somehow it doesn't make it to the graph:

Code: Select all

[root@nagios libexec]# ./check_rrdtraf -f '/var/lib/mrtg/172.27.134.1_10140.rrd' -w 200,200 -c 500,500 -l M
OK - Current BW in: .36Mbps Out: .29Mbps|in=.360971Mb/s;200;500 out=.293609Mb/s;200;500

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 11:57 am
by lmiltchev
Run the following commands and show the output:

Code: Select all

service npcd status
ll /usr/local/nagios/share/perfdata
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 12:25 pm
by vmesquita

Code: Select all

[root@nagios /]# service npcd status
NPCD running (pid 10255).
[root@nagios /]#

Code: Select all

[root@nagios /]# ll /usr/local/nagios/share/perfdata
.....
drwxrwxrwx 2 nagios nagios  4096 Dec  2  2011 ********
drwxrwxrwx 2 nagios nagios  4096 Dec  2  2011 *********
drwxrwxrwx 2 nagios nagios  4096 Sep  9 12:02 *********
drwxrwxrwx 2 nagios nagios  4096 Sep  9 12:02 ********
drwxrwxrwx 2 nagios nagios  4096 Sep 11 00:01 *********
Note: name of the hosts have been replaced by *****.

Code: Select all

[root@nagios /]# ls /usr/local/nagios/var/spool/xidpe | wc -l
1

Code: Select all

[root@nagios /]# ls /usr/local/nagios/var/spool/perfdata | wc -l
24845

Code: Select all

[root@nagios /]# ls /usr/local/nagios/var/spool/checkresults | wc -l
98062

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 12:33 pm
by lmiltchev
Make sure logging is enabled in the process_perfdata.cfg and npcd.cfg (log level is set to "1"):

Code: Select all

grep -i "log_level =" /usr/local/nagios/etc/pnp/process_perfdata.cfg
grep -i "log_level =" /usr/local/nagios/etc/pnp/npcd.cfg
After you modified the configs, restart npcd:

Code: Select all

service npcd restart
tail the logs, and post the output:

Code: Select all

tail 30 /usr/local/nagios/var/perfdata.log
tail 30 /usr/local/nagios/var/npcd.log

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 12:43 pm
by vmesquita
Both were 0, so I changed to 1 as suggested.

The last entrances of the log seem to date back to aug 30:

Code: Select all

==> /usr/local/nagios/var/perfdata.log <==
2013-08-29 17:21:39 [20265] [0] *** TIMEOUT: Please check your npcd.cfg
2013-08-29 17:21:39 [20265] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1377716242.perfdata.service-PID-20265 deleted
2013-08-29 17:21:39 [20265] [0] *** Timeout while processing Host: "*******" Service: "CPU_Stats"
2013-08-29 17:21:39 [20265] [0] *** process_perfdata.pl terminated on signal ALRM
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Please check your npcd.cfg
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1377864081.perfdata.service-PID-31084 deleted
2013-08-30 09:55:01 [31084] [0] *** Timeout while processing Host: "******" Service: "Ping"
2013-08-30 09:55:01 [31084] [0] *** process_perfdata.pl terminated on signal ALRM
tail 30 /usr/local/nagios/var/npcd.log

Code: Select all

reached: load 27.720000/10.000000 at i=1[09-12-2013 14:28:42] NPCD: WARN: MAX load reached: load 26.700000/10.000000 at i=1[09-12-2013 14:28:57] NPCD: WARN: MAX load reached: load 25.940000/10.000000 at i=1[09-12-2013 14:29:12] NPCD: WARN: MAX load reached: load 25.300000/10.000000 at i=1[09-12-2013 14:29:27] NPCD: WARN: MAX load reached: load 25.240000/10.000000 at i=1[09-12-2013 14:29:42] NPCD: WARN: MAX load reached: load 24.720000/10.000000 at i=1[09-12-2013 14:29:57] NPCD: WARN: MAX load reached: load 26.570000/10.000000 at i=1[09-12-2013 14:30:12] NPCD: WARN: MAX load reached: load 25.310000/10.000000 at i=1[09-12-2013 14:30:27] NPCD: WARN: MAX load reached: load 26.140000/10.000000 at i=1[09-12-2013 14:30:42] NPCD: WARN: MAX load reached: load 25.070000/10.000000 at i=1[09-12-2013 14:30:57] NPCD: WARN: MAX load reached: load 26.270000/10.000000 at i=1[09-12-2013 14:31:12] NPCD: WARN: MAX load reached: load 25.180000/10.000000 at i=1[09-12-2013 14:31:27] NPCD: WARN: MAX load reached: load 26.730000/10.000000 at i=1[09-12-2013 14:31:42] NPCD: WARN: MAX load reached: load 25.060000/10.000000 at i=1[09-12-2013 14:31:57] NPCD: WARN: MAX load reached: load 25.030000/10.000000 at i=1[09-12-2013 14:32:12] NPCD: WARN: MAX load reached: load 24.920000/10.000000 at i=1[09-12-2013 14:32:27] NPCD: WARN: MAX load reached: load 24.920000/10.000000 at i=1[09-12-2013 14:32:42] NPCD: WARN: MAX load reached: load 24.550000/10.000000 at i=1[09-12-2013 14:32:57] NPCD: WARN: MAX load reached: load 24.710000/10.000000 at i=1[09-12-2013 14:33:12] NPCD: WARN: MAX load reached: load 24.290000/10.000000 at i=1[09-12-2013 14:33:27] NPCD: WARN: MAX load reached: load 21.650000/10.000000 at i=1[09-12-2013 14:33:42] NPCD: WARN: MAX load reached: load 21.080000/10.000000 at i=1[09-12-2013 14:33:57] NPCD: WARN: MAX load reached: load 21.460000/10.000000 at i=1[09-12-2013 14:34:12] NPCD: WARN: MAX load reached: load 21.590000/10.000000 at i=1[09-12-2013 14:34:27] NPCD: WARN: MAX load reached: load 21.530000/10.000000 at i=1[09-12-2013 14:34:42] NPCD: WARN: MAX load reached: load 21.730000/10.000000 at i=1[09-12-2013 14:34:57] NPCD: WARN: MAX load reached: load 22.300000/10.000000 at i=1[09-12-2013 14:35:12] NPCD: WARN: MAX load reached: load 23.170000/10.000000 at i=1[09-12-2013 14:35:27] NPCD: WARN: MAX load reached: load 24.560000/10.000000 at i=1[09-12-2013 14:35:43] NPCD: WARN: MAX load reached: load 24.070000/10.000000 at i=1[09-12-2013 14:35:58] NPCD: WARN: MAX load reached: load 23.400000/10.000000 at i=1[09-12-2013 14:36:13] NPCD: WARN: MAX load reached: load 23.080000/10.000000 at i=1[09-12-2013 14:36:28] NPCD: WARN: MAX load reached: load 23.030000/10.000000 at i=1[09-12-2013 14:36:43] NPCD: WARN: MAX load reached: load 24.670000/10.000000 at i=1[09-12-2013 14:36:58]
 NPCD: WARN: MAX load reached: load 24.080000/10.000000 at i=1[09-12-2013 14:37:13] NPCD: WARN: MAX load reached: load 23.560000/10.000000 at i=1[09-12-2013 14:37:25] NPCD: Caught Termination Signal - Hasta la vista... baby
[09-12-2013 14:37:26] NPCD: npcd Daemon (0.4.14) started with PID=16225
[09-12-2013 14:37:26] NPCD: Please have a look at 'npcd -V' to get license information
[09-12-2013 14:37:26] NPCD: HINT: load_threshold is enabled - ('10.000000')
[09-12-2013 14:37:26] NPCD: WARN: MAX load reached: load 22.910000/10.000000 at i=0[09-12-2013 14:37:41] NPCD: WARN: MAX load reached: load 21.240000/10.000000 at i=1[09-12-2013 14:37:56] NPCD: WARN: MAX load reached: load 23.050000/10.000000 at i=1[09-12-2013 14:38:11] NPCD: WARN: MAX load reached: load 23.160000/10.000000 at i=1[09-12-2013 14:38:26] NPCD: WARN: MAX load reached: load 24.440000/10.000000 at i=1[09-12-2013 14:38:41] NPCD: WARN: MAX load reached: load 24.350000/10.000000 at i=1[09-12-2013 14:38:56] NPCD: WARN: MAX load reached: load 24.370000/10.000000 at i=1[09-12-2013 14:39:11] NPCD: WARN: MAX load reached: load 24.250000/10.000000 at i=1[09-12-2013 14:39:26] NPCD: WARN: MAX load reached: load 25.630000/10.000000 at i=1[09-12-2013 14:39:41] NPCD: WARN: MAX load reached: load 29.220000/10.000000 at i=1[09-12-2013 14:39:56] NPCD: WARN: MAX load reached: load 26.470000/10.000000 at i=1[09-12-2013 14:40:12] NPCD: WARN: MAX load reached: load 27.450000/10.000000 at i=1[09-12-2013 14:40:27] NPCD: WARN: MAX load reached: load 26.960000/10.000000 at i=1[09-12-2013 14:40:42] NPCD: WARN: MAX load reached: load 27.350000/10.000000 at i=1[09-12-2013 14:40:57] NPCD: WARN: MAX load reached: load 27.590000/10.000000 at i=1[09-12-2013 14:41:12] NPCD: WARN: MAX load reached: load 27.590000/10.000000 at i=1

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 12:54 pm
by sreinhardt
Looks like you are hitting max load on your system and it is not actually getting processed. What is the current load of your system? About how many hosts and service checks are you running?

Code: Select all

top -n 1

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 1:09 pm
by vmesquita

Code: Select all

Cpu(s): 26.6%us, 69.7%sy,  0.1%ni,  3.4%id,  0.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3115156k total,  2791908k used,   323248k free,   288468k buffers
Swap:  4194296k total,        8k used,  4194288k free,   809184k cached
We have 123 hosts and 1619 checks.

Re: Performance Graph broken

Posted: Thu Sep 12, 2013 2:02 pm
by lmiltchev
The default load threshold in the "/usr/local/nagios/etc/pnp/npcd.cfg" file is set to 10. It assumes you have a single core processor. Depending on your hardware, you can increase this (dual core: x 2, quad core x 4, etc.), for example:

Code: Select all

load_threshold = 20.0
or

Code: Select all

load_threshold = 40.0
then restart npcd:

Code: Select all

service npcd restart
You have quite many files piled up in the "/usr/local/nagios/var/spool/perfdata" and "usr/local/nagios/var/spool/checkresults" directories. You will have to probably delete these files:

Code: Select all

cd /usr/local/nagios/var/spool
rm -rf perfdata
mkdir perfdata
chown nagios:nagios perfdata
chmod 755 perfdata
rm -rf checkresults
mkdir checkresults
chown nagios:nagios checkresults
chmod 755 checkresults
service npcd restart
What's your hardware like on your nagios server (CPU, RAM, HDD)?