Nagios XI Performance Graphs Not Processing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Nagios XI Performance Graphs Not Processing

Post by sav2880 »

Having some issues with my 2012R2.9 installation of Nagios in regards to performance graphs. Basically, at the end of 2013, a lot of the host graphs just stopped processing. I've upped the logging on the npcd.log and the perfdata.log files to the max.

The npcd.log file is updating, but the perfdata.log file isn't. This is the last few lines of both:

npcd.log

Code: Select all

[10-06-2014 14:05:41] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 14:05:56] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 14:05:56] NPCD: DEBUG: load 1.800000/25.000000
[10-06-2014 14:05:56] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 14:05:56] NPCD: DEBUG: load 1.800000/25.000000
[10-06-2014 14:05:56] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 14:05:56] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 14:06:11] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 14:06:11] NPCD: DEBUG: load 1.980000/25.000000
[10-06-2014 14:06:11] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 14:06:11] NPCD: DEBUG: load 1.980000/25.000000
[10-06-2014 14:06:11] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 14:06:11] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 14:06:26] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 14:06:26] NPCD: DEBUG: load 1.920000/25.000000
[10-06-2014 14:06:26] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 14:06:26] NPCD: DEBUG: load 1.920000/25.000000
[10-06-2014 14:06:26] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 14:06:26] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 14:06:41] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 14:06:41] NPCD: DEBUG: load 1.720000/25.000000
[10-06-2014 14:06:41] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 14:06:41] NPCD: DEBUG: load 1.720000/25.000000
[10-06-2014 14:06:41] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 14:06:41] NPCD: No more files to process... waiting for 15 seconds
perfdata.log

Code: Select all

2013-12-18 17:02:51 [3021] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387466421.perfdata.service-PID-15435 deleted
2013-12-19 10:20:36 [15435] [0] *** Timeout while processing Host: "lvsclshdc1dn018" Service: "CentOS_Memory_Usage"
2013-12-19 10:20:36 [15435] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387886481.perfdata.service-PID-30347 deleted
2013-12-24 07:01:31 [30347] [0] *** Timeout while processing Host: "VCSCDEVSQL01" Service: "Drive_D__Disk_Transfers_Per_Second"
2013-12-24 07:01:31 [30347] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387886601.perfdata.service-PID-2261 deleted
2013-12-24 07:04:32 [2261] [0] *** Timeout while processing Host: "SCSTPRODSQL03_LVS" Service: "MSSQL_Free_Pages_Per_Sec"
2013-12-24 07:04:32 [2261] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1388263641.perfdata.service-PID-6470 deleted
2013-12-28 15:47:56 [6470] [0] *** Timeout while processing Host: "SCSTPRODSQL02_LVS" Service: "Drive_Z__Bytes_Per_Second"
2013-12-28 15:47:56 [6470] [0] *** process_perfdata.pl terminated on signal ALRM
The other thing I've seen is that the host-perfdata file and the service-perfdata files are 2GB in size. I can't seem to move them (still new to Linux) or rename them even with the Nagios service stopped. Those files haven't been modified since January or February, respectively.

I'm sure this is causing it not to show graphs, although I am hoping that it is still collecting the data (I see RRD files that look up to date and valid). What should I look at next to troubleshoot this?

Thanks!
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Re: Nagios XI Performance Graphs Not Processing

Post by sav2880 »

Quick update ... all of the .xml and .rrd files show a date of 2013-12-31. Worried the data might be lost.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios XI Performance Graphs Not Processing

Post by abrist »

sav2880 wrote:The other thing I've seen is that the host-perfdata file and the service-perfdata files are 2GB in size.
This is your issue. Those files are spools for host and service checks. They are probably too big to get parsed by the perfdata scripts now.
sav2880 wrote:Worried the data might be lost.
This is most likely the case. There are obviously some results in the files mentioned above, but you will have a hard time getting the scripts to parse them due to size. You most likely need to remove them and restart npcd. Just an FYI: that data is most likely not easily recoverable and the commands below will delete them.

Code: Select all

rm /usr/local/nagios/var/service-perfdata
rm /usr/local/nagios/var/host-perfdata
service npcd restart
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Re: Nagios XI Performance Graphs Not Processing

Post by sav2880 »

Steps completed. I assume it should be re-creating those files though, and perfdata.log should be updating. Neither is happening. Potentially some other service off base here?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios XI Performance Graphs Not Processing

Post by abrist »

Are those files getting populated and reaped every 15 or so seconds? They should increase in size, and then get reaped and reset to 0 size and then repeat.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Re: Nagios XI Performance Graphs Not Processing

Post by sav2880 »

It doesn't appear to be doing so. The thing that I don't get is that the perfdata.log file isn't updating at all.

Here's that /usr/local/nagios/var directory:

Code: Select all

total 52712
drwxrwxr-x. 2 nagios nagios    45056 Oct  6 00:00 archives
-rw-r--r--. 1 apache apache     2617 Jun 21  2012 graphapi.log
-rw-r--r--  1 nagios nagios        6 Oct  6 14:28 nagios.lock
-rw-rw-r--  1 nagios nagios  1825613 Oct  6 16:30 nagios.log
-rw-r--r--  1 nagios nagios        5 Oct  6 13:56 ndo2db.lock
-rw-rw-r--  1 nagios nagios        0 Oct  6 14:28 ndomod.tmp
srwxr-xr-x  1 nagios nagios        0 Oct  6 13:56 ndo.sock
-rw-r--r--  1 nagios nagios 10366225 Oct  6 16:31 npcd.log
-rw-r--r--  1 nagios nagios 10485763 Oct  5 03:40 npcd.log.old
-rw-r--r--. 1 nagios nagios  7264609 Oct  6 14:28 objects.cache
-rw-rw-rw-. 1 nagios nagios  1311070 Dec 28  2013 perfdata.log
-rw-------  1 nagios users  11336933 Oct  6 16:28 retention.dat
drwxrwsr-x. 2 nagios nagcmd     4096 Oct  6 14:28 rw
drwxr-xr-x. 5 nagios nagios     4096 Aug 29  2011 spool
drwxr-xr-x. 2 nagios nagios     4096 Dec 31  2013 stats
-rw-rw-r--  1 nagios users  11285251 Oct  6 16:31 status.dat
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios XI Performance Graphs Not Processing

Post by abrist »

Can you post a tail of the following logs:

Code: Select all

tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Re: Nagios XI Performance Graphs Not Processing

Post by sav2880 »

perfdata.log

Code: Select all

2013-12-18 17:02:51 [3021] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-19 10:20:36 [15435] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387466421.perfdata.service-PID-15435 deleted
2013-12-19 10:20:36 [15435] [0] *** Timeout while processing Host: "lvsclshdc1dn018" Service: "CentOS_Memory_Usage"
2013-12-19 10:20:36 [15435] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-24 07:01:31 [30347] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387886481.perfdata.service-PID-30347 deleted
2013-12-24 07:01:31 [30347] [0] *** Timeout while processing Host: "VCSCDEVSQL01" Service: "Drive_D__Disk_Transfers_Per_Second"
2013-12-24 07:01:31 [30347] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-24 07:04:32 [2261] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1387886601.perfdata.service-PID-2261 deleted
2013-12-24 07:04:32 [2261] [0] *** Timeout while processing Host: "SCSTPRODSQL03_LVS" Service: "MSSQL_Free_Pages_Per_Sec"
2013-12-24 07:04:32 [2261] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-28 15:47:56 [6470] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1388263641.perfdata.service-PID-6470 deleted
2013-12-28 15:47:56 [6470] [0] *** Timeout while processing Host: "SCSTPRODSQL02_LVS" Service: "Drive_Z__Bytes_Per_Second"
2013-12-28 15:47:56 [6470] [0] *** process_perfdata.pl terminated on signal ALRM

Code: Select all

[10-06-2014 20:09:02] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 20:09:17] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 20:09:17] NPCD: DEBUG: load 1.030000/25.000000
[10-06-2014 20:09:17] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 20:09:17] NPCD: DEBUG: load 1.030000/25.000000
[10-06-2014 20:09:17] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 20:09:17] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 20:09:32] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 20:09:32] NPCD: DEBUG: load 0.800000/25.000000
[10-06-2014 20:09:32] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 20:09:32] NPCD: DEBUG: load 0.800000/25.000000
[10-06-2014 20:09:32] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 20:09:32] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 20:09:47] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 20:09:47] NPCD: DEBUG: load 0.620000/25.000000
[10-06-2014 20:09:47] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 20:09:47] NPCD: DEBUG: load 0.620000/25.000000
[10-06-2014 20:09:47] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 20:09:47] NPCD: No more files to process... waiting for 15 seconds
[10-06-2014 20:10:02] NPCD: Found 2 files in /usr/local/nagios/var/spool/perfdata/
[10-06-2014 20:10:02] NPCD: DEBUG: load 0.970000/25.000000
[10-06-2014 20:10:02] NPCD: ThreadCounter 0/5 File is .
[10-06-2014 20:10:02] NPCD: DEBUG: load 0.970000/25.000000
[10-06-2014 20:10:02] NPCD: ThreadCounter 0/5 File is ..
[10-06-2014 20:10:02] NPCD: No more files to process... waiting for 15 seconds
So as mentioned, npcd is updating, perfdata is not. Debug logging is turned on for both.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios XI Performance Graphs Not Processing

Post by abrist »

How many files are in the following directories?

Code: Select all

ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sav2880
Posts: 60
Joined: Tue Mar 13, 2012 8:24 am

Re: Nagios XI Performance Graphs Not Processing

Post by sav2880 »

Code: Select all

# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
# ls /usr/local/nagios/var/spool/perfdata | wc -l
0
# ls /usr/local/nagios/var/spool/checkresults | wc -l
8690
#
Locked