performance data stopped again
Posted: Tue Apr 08, 2014 1:54 pm
I am really getting frustrated with these servers! My newest box (32 bit Nagios converted to 64 bit Nagios from VM image) quit processing performance data at about 1030PM last night.
npcd still running, tail npcd.log
[04-08-2014 14:23:55] NPCD: Processing file '1396981418.perfdata.host'
[04-08-2014 14:23:55] NPCD: Processing file '1396981418.perfdata.service'
[04-08-2014 14:23:55] NPCD: Processing file '1396981432.perfdata.host'
[04-08-2014 14:23:55] NPCD: Processing file '1396981432.perfdata.service'
[04-08-2014 14:23:56] NPCD: No more files to process... waiting for 15 seconds
[04-08-2014 14:24:11] NPCD: No more files to process... waiting for 15 seconds
perfdata.log has not been updated since 1050 last night, tail perfdata.log
2014-04-07 22:29:02 [8618] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-07 22:29:02 [8618] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1396924115.perfdata.service-PID-8618 deleted
2014-04-07 22:29:02 [8618] [0] *** Timeout while processing Host: "00926_OB01-MP" Service: "memAvailMB"
2014-04-07 22:29:02 [8618] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Timeout after 5 secs. ***
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1396925420.perfdata.service-PID-25469 deleted
2014-04-07 22:50:45 [25469] [0] *** Timeout while processing Host: "01639_HTTP" Service: "Ping"
2014-04-07 22:50:45 [25469] [0] *** process_perfdata.pl terminated on signal ALRM
I compared the npcd.cfg of my working system to the one that quit last night and there is no difference so I don't know what I am supposed to check in my npcd.cfg!
I stopped and restarted npcd and ndo2db, no joy. Stopped and started all Nagios services and still no joy.
service nagiosxi stop
service npcd stop
service ndo2db stop
service nagios stop
service postgresql stop
service mysqld stop
service httpd stop
service httpd start
service mysqld start
service postgresql start
service nagios start
service ndo2db start
service npcd start
service nagiosxi start
server has not been restarted:
[root@LPNAGV04 var]# uptime
14:31:28 up 1 day, 6:56, 1 user, load average: 12.93, 9.26, 5.53
So I shutdown all Nagios services again and ran repairmysql on nagios and nagiosql then ran dbmaint. STILL not getting perfdata and perfdata.log has not been updated, yet files are coming and going in the perfdata spool directory.
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
total 328
-rw-rw-r-- 1 nagios nagios 175933 Apr 8 14:51 1396983103.perfdata.service-PID-23515
-rw-rw-r-- 1 nagios nagios 30763 Apr 8 14:51 1396983103.perfdata.host-PID-23516
-rw-rw-r-- 1 nagios nagios 95863 Apr 8 14:51 1396983088.perfdata.service-PID-23514
-rw-rw-r-- 1 nagios nagios 27680 Apr 8 14:51 1396983088.perfdata.host-PID-23513
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
total 268
-rw-rw-r-- 1 nagios nagios 175933 Apr 8 14:51 1396983103.perfdata.service-PID-23515
-rw-rw-r-- 1 nagios nagios 95863 Apr 8 14:51 1396983088.perfdata.service-PID-23514
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
I've missed something and am clueless as to what it is....
npcd still running, tail npcd.log
[04-08-2014 14:23:55] NPCD: Processing file '1396981418.perfdata.host'
[04-08-2014 14:23:55] NPCD: Processing file '1396981418.perfdata.service'
[04-08-2014 14:23:55] NPCD: Processing file '1396981432.perfdata.host'
[04-08-2014 14:23:55] NPCD: Processing file '1396981432.perfdata.service'
[04-08-2014 14:23:56] NPCD: No more files to process... waiting for 15 seconds
[04-08-2014 14:24:11] NPCD: No more files to process... waiting for 15 seconds
perfdata.log has not been updated since 1050 last night, tail perfdata.log
2014-04-07 22:29:02 [8618] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-07 22:29:02 [8618] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1396924115.perfdata.service-PID-8618 deleted
2014-04-07 22:29:02 [8618] [0] *** Timeout while processing Host: "00926_OB01-MP" Service: "memAvailMB"
2014-04-07 22:29:02 [8618] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Timeout after 5 secs. ***
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-07 22:50:45 [25469] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1396925420.perfdata.service-PID-25469 deleted
2014-04-07 22:50:45 [25469] [0] *** Timeout while processing Host: "01639_HTTP" Service: "Ping"
2014-04-07 22:50:45 [25469] [0] *** process_perfdata.pl terminated on signal ALRM
I compared the npcd.cfg of my working system to the one that quit last night and there is no difference so I don't know what I am supposed to check in my npcd.cfg!
I stopped and restarted npcd and ndo2db, no joy. Stopped and started all Nagios services and still no joy.
service nagiosxi stop
service npcd stop
service ndo2db stop
service nagios stop
service postgresql stop
service mysqld stop
service httpd stop
service httpd start
service mysqld start
service postgresql start
service nagios start
service ndo2db start
service npcd start
service nagiosxi start
server has not been restarted:
[root@LPNAGV04 var]# uptime
14:31:28 up 1 day, 6:56, 1 user, load average: 12.93, 9.26, 5.53
So I shutdown all Nagios services again and ran repairmysql on nagios and nagiosql then ran dbmaint. STILL not getting perfdata and perfdata.log has not been updated, yet files are coming and going in the perfdata spool directory.
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
total 328
-rw-rw-r-- 1 nagios nagios 175933 Apr 8 14:51 1396983103.perfdata.service-PID-23515
-rw-rw-r-- 1 nagios nagios 30763 Apr 8 14:51 1396983103.perfdata.host-PID-23516
-rw-rw-r-- 1 nagios nagios 95863 Apr 8 14:51 1396983088.perfdata.service-PID-23514
-rw-rw-r-- 1 nagios nagios 27680 Apr 8 14:51 1396983088.perfdata.host-PID-23513
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
total 268
-rw-rw-r-- 1 nagios nagios 175933 Apr 8 14:51 1396983103.perfdata.service-PID-23515
-rw-rw-r-- 1 nagios nagios 95863 Apr 8 14:51 1396983088.perfdata.service-PID-23514
[root@LPNAGV04 var]# ls -lt /var/nagiosramdisk/spool/perfdata/
I've missed something and am clueless as to what it is....