Page 1 of 4
Performance Graph broken
Posted: Thu Sep 12, 2013 9:58 am
by vmesquita
Hi,
Latelly performance graph is not working anymore. Every service with Graph shows a blank graphis, just the thresholds, like the attached file. Any ideas on how to fix this?
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 11:08 am
by lmiltchev
Have you tried following the steps, outlined on our wiki page?
http://support.nagios.com/wiki/index.ph ... h_Problems
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 11:31 am
by vmesquita
I haven't seen it before, but I just tried and it didn't fix. The data seems to be obtained, but somehow it doesn't make it to the graph:
Code: Select all
[root@nagios libexec]# ./check_rrdtraf -f '/var/lib/mrtg/172.27.134.1_10140.rrd' -w 200,200 -c 500,500 -l M
OK - Current BW in: .36Mbps Out: .29Mbps|in=.360971Mb/s;200;500 out=.293609Mb/s;200;500
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 11:57 am
by lmiltchev
Run the following commands and show the output:
Code: Select all
service npcd status
ll /usr/local/nagios/share/perfdata
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 12:25 pm
by vmesquita
Code: Select all
[root@nagios /]# service npcd status
NPCD running (pid 10255).
[root@nagios /]#
Code: Select all
[root@nagios /]# ll /usr/local/nagios/share/perfdata
.....
drwxrwxrwx 2 nagios nagios 4096 Dec 2 2011 ********
drwxrwxrwx 2 nagios nagios 4096 Dec 2 2011 *********
drwxrwxrwx 2 nagios nagios 4096 Sep 9 12:02 *********
drwxrwxrwx 2 nagios nagios 4096 Sep 9 12:02 ********
drwxrwxrwx 2 nagios nagios 4096 Sep 11 00:01 *********
Note: name of the hosts have been replaced by *****.
Code: Select all
[root@nagios /]# ls /usr/local/nagios/var/spool/xidpe | wc -l
1
Code: Select all
[root@nagios /]# ls /usr/local/nagios/var/spool/perfdata | wc -l
24845
Code: Select all
[root@nagios /]# ls /usr/local/nagios/var/spool/checkresults | wc -l
98062
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 12:33 pm
by lmiltchev
Make sure logging is enabled in the process_perfdata.cfg and npcd.cfg (log level is set to "1"):
Code: Select all
grep -i "log_level =" /usr/local/nagios/etc/pnp/process_perfdata.cfg
grep -i "log_level =" /usr/local/nagios/etc/pnp/npcd.cfg
After you modified the configs, restart npcd:
tail the logs, and post the output:
Code: Select all
tail 30 /usr/local/nagios/var/perfdata.log
tail 30 /usr/local/nagios/var/npcd.log
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 12:43 pm
by vmesquita
Both were 0, so I changed to 1 as suggested.
The last entrances of the log seem to date back to aug 30:
Code: Select all
==> /usr/local/nagios/var/perfdata.log <==
2013-08-29 17:21:39 [20265] [0] *** TIMEOUT: Please check your npcd.cfg
2013-08-29 17:21:39 [20265] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1377716242.perfdata.service-PID-20265 deleted
2013-08-29 17:21:39 [20265] [0] *** Timeout while processing Host: "*******" Service: "CPU_Stats"
2013-08-29 17:21:39 [20265] [0] *** process_perfdata.pl terminated on signal ALRM
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: Please check your npcd.cfg
2013-08-30 09:55:01 [31084] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1377864081.perfdata.service-PID-31084 deleted
2013-08-30 09:55:01 [31084] [0] *** Timeout while processing Host: "******" Service: "Ping"
2013-08-30 09:55:01 [31084] [0] *** process_perfdata.pl terminated on signal ALRM
tail 30 /usr/local/nagios/var/npcd.log
Code: Select all
reached: load 27.720000/10.000000 at i=1[09-12-2013 14:28:42] NPCD: WARN: MAX load reached: load 26.700000/10.000000 at i=1[09-12-2013 14:28:57] NPCD: WARN: MAX load reached: load 25.940000/10.000000 at i=1[09-12-2013 14:29:12] NPCD: WARN: MAX load reached: load 25.300000/10.000000 at i=1[09-12-2013 14:29:27] NPCD: WARN: MAX load reached: load 25.240000/10.000000 at i=1[09-12-2013 14:29:42] NPCD: WARN: MAX load reached: load 24.720000/10.000000 at i=1[09-12-2013 14:29:57] NPCD: WARN: MAX load reached: load 26.570000/10.000000 at i=1[09-12-2013 14:30:12] NPCD: WARN: MAX load reached: load 25.310000/10.000000 at i=1[09-12-2013 14:30:27] NPCD: WARN: MAX load reached: load 26.140000/10.000000 at i=1[09-12-2013 14:30:42] NPCD: WARN: MAX load reached: load 25.070000/10.000000 at i=1[09-12-2013 14:30:57] NPCD: WARN: MAX load reached: load 26.270000/10.000000 at i=1[09-12-2013 14:31:12] NPCD: WARN: MAX load reached: load 25.180000/10.000000 at i=1[09-12-2013 14:31:27] NPCD: WARN: MAX load reached: load 26.730000/10.000000 at i=1[09-12-2013 14:31:42] NPCD: WARN: MAX load reached: load 25.060000/10.000000 at i=1[09-12-2013 14:31:57] NPCD: WARN: MAX load reached: load 25.030000/10.000000 at i=1[09-12-2013 14:32:12] NPCD: WARN: MAX load reached: load 24.920000/10.000000 at i=1[09-12-2013 14:32:27] NPCD: WARN: MAX load reached: load 24.920000/10.000000 at i=1[09-12-2013 14:32:42] NPCD: WARN: MAX load reached: load 24.550000/10.000000 at i=1[09-12-2013 14:32:57] NPCD: WARN: MAX load reached: load 24.710000/10.000000 at i=1[09-12-2013 14:33:12] NPCD: WARN: MAX load reached: load 24.290000/10.000000 at i=1[09-12-2013 14:33:27] NPCD: WARN: MAX load reached: load 21.650000/10.000000 at i=1[09-12-2013 14:33:42] NPCD: WARN: MAX load reached: load 21.080000/10.000000 at i=1[09-12-2013 14:33:57] NPCD: WARN: MAX load reached: load 21.460000/10.000000 at i=1[09-12-2013 14:34:12] NPCD: WARN: MAX load reached: load 21.590000/10.000000 at i=1[09-12-2013 14:34:27] NPCD: WARN: MAX load reached: load 21.530000/10.000000 at i=1[09-12-2013 14:34:42] NPCD: WARN: MAX load reached: load 21.730000/10.000000 at i=1[09-12-2013 14:34:57] NPCD: WARN: MAX load reached: load 22.300000/10.000000 at i=1[09-12-2013 14:35:12] NPCD: WARN: MAX load reached: load 23.170000/10.000000 at i=1[09-12-2013 14:35:27] NPCD: WARN: MAX load reached: load 24.560000/10.000000 at i=1[09-12-2013 14:35:43] NPCD: WARN: MAX load reached: load 24.070000/10.000000 at i=1[09-12-2013 14:35:58] NPCD: WARN: MAX load reached: load 23.400000/10.000000 at i=1[09-12-2013 14:36:13] NPCD: WARN: MAX load reached: load 23.080000/10.000000 at i=1[09-12-2013 14:36:28] NPCD: WARN: MAX load reached: load 23.030000/10.000000 at i=1[09-12-2013 14:36:43] NPCD: WARN: MAX load reached: load 24.670000/10.000000 at i=1[09-12-2013 14:36:58]
NPCD: WARN: MAX load reached: load 24.080000/10.000000 at i=1[09-12-2013 14:37:13] NPCD: WARN: MAX load reached: load 23.560000/10.000000 at i=1[09-12-2013 14:37:25] NPCD: Caught Termination Signal - Hasta la vista... baby
[09-12-2013 14:37:26] NPCD: npcd Daemon (0.4.14) started with PID=16225
[09-12-2013 14:37:26] NPCD: Please have a look at 'npcd -V' to get license information
[09-12-2013 14:37:26] NPCD: HINT: load_threshold is enabled - ('10.000000')
[09-12-2013 14:37:26] NPCD: WARN: MAX load reached: load 22.910000/10.000000 at i=0[09-12-2013 14:37:41] NPCD: WARN: MAX load reached: load 21.240000/10.000000 at i=1[09-12-2013 14:37:56] NPCD: WARN: MAX load reached: load 23.050000/10.000000 at i=1[09-12-2013 14:38:11] NPCD: WARN: MAX load reached: load 23.160000/10.000000 at i=1[09-12-2013 14:38:26] NPCD: WARN: MAX load reached: load 24.440000/10.000000 at i=1[09-12-2013 14:38:41] NPCD: WARN: MAX load reached: load 24.350000/10.000000 at i=1[09-12-2013 14:38:56] NPCD: WARN: MAX load reached: load 24.370000/10.000000 at i=1[09-12-2013 14:39:11] NPCD: WARN: MAX load reached: load 24.250000/10.000000 at i=1[09-12-2013 14:39:26] NPCD: WARN: MAX load reached: load 25.630000/10.000000 at i=1[09-12-2013 14:39:41] NPCD: WARN: MAX load reached: load 29.220000/10.000000 at i=1[09-12-2013 14:39:56] NPCD: WARN: MAX load reached: load 26.470000/10.000000 at i=1[09-12-2013 14:40:12] NPCD: WARN: MAX load reached: load 27.450000/10.000000 at i=1[09-12-2013 14:40:27] NPCD: WARN: MAX load reached: load 26.960000/10.000000 at i=1[09-12-2013 14:40:42] NPCD: WARN: MAX load reached: load 27.350000/10.000000 at i=1[09-12-2013 14:40:57] NPCD: WARN: MAX load reached: load 27.590000/10.000000 at i=1[09-12-2013 14:41:12] NPCD: WARN: MAX load reached: load 27.590000/10.000000 at i=1
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 12:54 pm
by sreinhardt
Looks like you are hitting max load on your system and it is not actually getting processed. What is the current load of your system? About how many hosts and service checks are you running?
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 1:09 pm
by vmesquita
Code: Select all
Cpu(s): 26.6%us, 69.7%sy, 0.1%ni, 3.4%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 3115156k total, 2791908k used, 323248k free, 288468k buffers
Swap: 4194296k total, 8k used, 4194288k free, 809184k cached
We have 123 hosts and 1619 checks.
Re: Performance Graph broken
Posted: Thu Sep 12, 2013 2:02 pm
by lmiltchev
The default load threshold in the "/usr/local/nagios/etc/pnp/npcd.cfg" file is set to 10. It assumes you have a single core processor. Depending on your hardware, you can increase this (dual core: x 2, quad core x 4, etc.), for example:
or
then restart npcd:
You have quite many files piled up in the "/usr/local/nagios/var/spool/perfdata" and "usr/local/nagios/var/spool/checkresults" directories. You will have to probably delete these files:
Code: Select all
cd /usr/local/nagios/var/spool
rm -rf perfdata
mkdir perfdata
chown nagios:nagios perfdata
chmod 755 perfdata
rm -rf checkresults
mkdir checkresults
chown nagios:nagios checkresults
chmod 755 checkresults
service npcd restart
What's your hardware like on your nagios server (CPU, RAM, HDD)?