Page 1 of 4
rrd graphs showing zero data for given interval
Posted: Tue Feb 18, 2014 2:24 pm
by jericho_g
Hello -
We are running Nagios XI 2012R1.8 on Centos5.6. On our rrd graphs for Cisco routers, I'm seeing zero bandwidth reported during some intervals where there should be data. This can be seen when compared with Cacti graphs running on a different host. Is there a reason why this is occurring? Performance-wise, I don't see any symptoms on the Nagios host itself.
Example graphs for comparison are attached. One from Nagios XI, and one from Cacti, for the same monitored router. This type of symptom is also occurring on our graphs for other routers.
Incorrect graph from Nagios XI:
Nagios_XI_rrd1.jpg
Correct graph from Cacti:
cacti_rrd1.jpg
-Jericho
Re: rrd graphs showing zero data for given interval
Posted: Tue Feb 18, 2014 2:48 pm
by abrist
I would start by checking the perfdata logs:
Code: Select all
tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Re: rrd graphs showing zero data for given interval
Posted: Tue Feb 18, 2014 3:35 pm
by jericho_g
From the perfdata.log:
tail -25 /usr/local/nagios/var/perfdata.log
2014-02-18 14:34:56 [37785] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-18 14:38:09 [60058] [0] *** TIMEOUT: Timeout after 10 secs. ***
2014-02-18 14:38:09 [60058] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-18 14:38:09 [60058] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-18 14:38:09 [60058] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392752264.perfdata.service-PID-60058 deleted
2014-02-18 14:38:09 [60058] [0] *** Timeout while processing Host: "mpls1.###.com" Service: "GigabitEthernet0_0_Bandwidth"
2014-02-18 14:38:09 [60058] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Timeout after 10 secs. ***
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392752564.perfdata.service-PID-52283 deleted
2014-02-18 14:43:07 [52283] [0] *** Timeout while processing Host: "poe10.###.com" Service: "FastEthernet0_15_Bandwidth"
2014-02-18 14:43:07 [52283] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-18 15:06:19 [27024] [0] *** TIMEOUT: Timeout after 10 secs. ***
2014-02-18 15:06:19 [27024] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-18 15:06:19 [27024] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-18 15:06:19 [27024] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392753959.perfdata.service-PID-27024 deleted
2014-02-18 15:06:19 [27024] [0] *** Timeout while processing Host: "poe22.###.com" Service: "FastEthernet0_9_Bandwidth"
2014-02-18 15:06:19 [27024] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-18 15:18:44 [12423] [0] *** TIMEOUT: Timeout after 10 secs. ***
2014-02-18 15:18:44 [12423] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-18 15:18:44 [12423] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-18 15:18:44 [12423] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392754709.perfdata.service-PID-12423 deleted
2014-02-18 15:18:44 [12423] [0] *** Timeout while processing Host: "poe11.###.com" Service: "GigabitEthernet0_1_Bandwidth"
2014-02-18 15:18:44 [12423] [0] *** process_perfdata.pl terminated on signal ALRM
From npcd.log:
[02-18-2014 15:33:31] NPCD: ThreadCounter 0/4 File is 1392755594.perfdata.host
[02-18-2014 15:33:31] NPCD: Regular File: 1392755594.perfdata.host
[02-18-2014 15:33:31] NPCD: A thread was started on thread_counter = 0
[02-18-2014 15:33:31] NPCD: Processing file 1392755594.perfdata.host with ID 140737343776512 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1392755594.perfdata.host
[02-18-2014 15:33:31] NPCD: DEBUG: load 1.820000/40.000000
[02-18-2014 15:33:31] NPCD: Processing file '1392755594.perfdata.host'
[02-18-2014 15:33:31] NPCD: ThreadCounter 1/4 File is 1392755594.perfdata.service
[02-18-2014 15:33:31] NPCD: Regular File: 1392755594.perfdata.service
[02-18-2014 15:33:31] NPCD: A thread was started on thread_counter = 1
[02-18-2014 15:33:31] NPCD: Processing file 1392755594.perfdata.service with ID 140737333286656 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1392755594.perfdata.service
[02-18-2014 15:33:31] NPCD: DEBUG: load 1.820000/40.000000
[02-18-2014 15:33:31] NPCD: Processing file '1392755594.perfdata.service'
[02-18-2014 15:33:31] NPCD: ThreadCounter 2/4 File is 1392755609.perfdata.host
[02-18-2014 15:33:31] NPCD: Regular File: 1392755609.perfdata.host
[02-18-2014 15:33:31] NPCD: A thread was started on thread_counter = 2
[02-18-2014 15:33:31] NPCD: DEBUG: load 1.820000/40.000000
[02-18-2014 15:33:31] NPCD: ThreadCounter 3/4 File is 1392755609.perfdata.service
[02-18-2014 15:33:31] NPCD: Processing file 1392755609.perfdata.host with ID 140737320609536 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1392755609.perfdata.host
[02-18-2014 15:33:31] NPCD: Regular File: 1392755609.perfdata.service
[02-18-2014 15:33:31] NPCD: Processing file '1392755609.perfdata.host'
[02-18-2014 15:33:31] NPCD: A thread was started on thread_counter = 3
[02-18-2014 15:33:31] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[02-18-2014 15:33:31] NPCD: Processing file 1392755609.perfdata.service with ID 140737310119680 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1392755609.perfdata.service
[02-18-2014 15:33:31] NPCD: Processing file '1392755609.perfdata.service'
[02-18-2014 15:33:32] NPCD: No more files to process... waiting for 10 seconds
Re: rrd graphs showing zero data for given interval
Posted: Tue Feb 18, 2014 4:35 pm
by slansing
Is this same gap showing in the Graph Explorer? Home > Graph Explorer > Scalable Performance Graphs > Host-->Service
Re: rrd graphs showing zero data for given interval
Posted: Tue Feb 18, 2014 5:56 pm
by jericho_g
Yes. Though this graph is time-shifted for my local timezone, PT vs the earlier graphs were reflecting time on the Nagios host (ET).
Nagios_XI_grph_xplr1.jpg
Re: rrd graphs showing zero data for given interval
Posted: Wed Feb 19, 2014 11:30 am
by abrist
jericho_g wrote:2014-02-18 14:38:09 [60058] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Timeout after 10 secs. ***
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-18 14:43:07 [52283] [0] *** TIMEOUT: Please check your npcd.cfg
You are experiencing timeouts. Increase the threshold by editing:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:
To:
Save out and restart npcd:
Re: rrd graphs showing zero data for given interval
Posted: Wed Feb 19, 2014 2:04 pm
by jericho_g
Ok, thanks. Is there a service impact with this change? Or does it only affect rrd graphs? Trying to assess whether we need to take a maintenance window for this change, or not.
Re: rrd graphs showing zero data for given interval
Posted: Wed Feb 19, 2014 5:02 pm
by sreinhardt
This really should only effect processing performance data. It might cause some longer and slightly higher load times as it will allow processing for a longer period, but it is necessary to properly reap your data.
Re: rrd graphs showing zero data for given interval
Posted: Fri Feb 21, 2014 1:08 pm
by jericho_g
Timer changed to 20 on Wednesday, but still problematic, so changed to 25 last night, as advised. But we're still seeing dropouts in the rrd graphs, and timeouts in the logs (see below).
[root@clesitonag1 ~]# tail -25 /usr/local/nagios/var/perfdata.log
2014-02-20 14:05:59 [47164] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-20 16:40:56 [49598] [0] *** TIMEOUT: Timeout after 20 secs. ***
2014-02-20 16:40:56 [49598] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-20 16:40:56 [49598] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-20 16:40:56 [49598] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392932422.perfdata.service-PID-49598 deleted
2014-02-20 16:40:56 [49598] [0] *** Timeout while processing Host: "poe10.###.com" Service: "FastEthernet0_13_Bandwidth"
2014-02-20 16:40:56 [49598] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-21 00:35:37 [29889] [0] *** TIMEOUT: Timeout after 20 secs. ***
2014-02-21 00:35:37 [29889] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-21 00:35:37 [29889] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-21 00:35:37 [29889] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392960907.perfdata.service-PID-29889 deleted
2014-02-21 00:35:37 [29889] [0] *** Timeout while processing Host: "ace01.###.com" Service: "windows_Virtual_Memory_Prd"
2014-02-21 00:35:37 [29889] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-21 03:05:59 [48197] [0] *** TIMEOUT: Timeout after 20 secs. ***
2014-02-21 03:05:59 [48197] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-21 03:05:59 [48197] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-21 03:05:59 [48197] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392969922.perfdata.service-PID-48197 deleted
2014-02-21 03:05:59 [48197] [0] *** Timeout while processing Host: "wap10" Service: "wifi1_Bandwidth"
2014-02-21 03:05:59 [48197] [0] *** process_perfdata.pl terminated on signal ALRM
2014-02-21 04:07:00 [35114] [0] *** TIMEOUT: Timeout after 20 secs. ***
2014-02-21 04:07:00 [35114] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-02-21 04:07:00 [35114] [0] *** TIMEOUT: Please check your npcd.cfg
2014-02-21 04:07:00 [35114] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1392973582.perfdata.service-PID-35114 deleted
2014-02-21 04:07:00 [35114] [0] *** Timeout while processing Host: "dat25.###.com" Service: "FastEthernet0 27 Bandwidth"
2014-02-21 04:07:00 [35114] [0] *** process_perfdata.pl terminated on signal ALRM
[root@clesitonag1 ~]# tail -25 /usr/local/nagios/var/npcd.log
[02-21-2014 12:58:50] NPCD: Processing file 1393005513.perfdata.host with ID 140737343776512 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1393005513.perfdata.host
[02-21-2014 12:58:50] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[02-21-2014 12:58:50] NPCD: Processing file 1393005513.perfdata.service with ID 140737333286656 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1393005513.perfdata.service
[02-21-2014 12:58:50] NPCD: Processing file '1393005513.perfdata.host'
[02-21-2014 12:58:50] NPCD: Processing file '1393005513.perfdata.service'
[02-21-2014 12:58:51] NPCD: No more files to process... waiting for 10 seconds
[02-21-2014 12:59:01] NPCD: Found 4 files in /usr/local/nagios/var/spool/perfdata/
[02-21-2014 12:59:01] NPCD: DEBUG: load 1.240000/40.000000
[02-21-2014 12:59:01] NPCD: ThreadCounter 0/4 File is .
[02-21-2014 12:59:01] NPCD: DEBUG: load 1.240000/40.000000
[02-21-2014 12:59:01] NPCD: ThreadCounter 0/4 File is ..
[02-21-2014 12:59:01] NPCD: DEBUG: load 1.240000/40.000000
[02-21-2014 12:59:01] NPCD: ThreadCounter 0/4 File is 1393005528.perfdata.host
[02-21-2014 12:59:01] NPCD: Regular File: 1393005528.perfdata.host
[02-21-2014 12:59:01] NPCD: A thread was started on thread_counter = 0
[02-21-2014 12:59:01] NPCD: DEBUG: load 1.240000/40.000000
[02-21-2014 12:59:01] NPCD: ThreadCounter 1/4 File is 1393005528.perfdata.service
[02-21-2014 12:59:01] NPCD: Regular File: 1393005528.perfdata.service
[02-21-2014 12:59:01] NPCD: A thread was started on thread_counter = 1
[02-21-2014 12:59:01] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[02-21-2014 12:59:01] NPCD: Processing file 1393005528.perfdata.service with ID 140737333286656 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1393005528.perfdata.service
[02-21-2014 12:59:01] NPCD: Processing file '1393005528.perfdata.service'
[02-21-2014 12:59:01] NPCD: Processing file 1393005528.perfdata.host with ID 140737343776512 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1393005528.perfdata.host
[02-21-2014 12:59:01] NPCD: Processing file '1393005528.perfdata.host'
[02-21-2014 12:59:01] NPCD: No more files to process... waiting for 10 seconds
Re: rrd graphs showing zero data for given interval
Posted: Fri Feb 21, 2014 1:22 pm
by abrist
You are still hitting the timeout limits. Increase it further to '40'.
Lets check the spool to make sure there is not a backlog of perfdata files:
Code: Select all
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/xdipe | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l