Page 1 of 1

All Host Graphs stopped several weeks ago.

Posted: Thu Apr 23, 2015 3:29 am
by nottheadmin
Hi, I left my Nagios XI unattended while i was on leave, came back and it had several issues. The disk was full and somebody had done a dirty reboot on it. I increased to disk size and repaired the database and it was up and running again.

I have just noticed that my host graphs have stopped updating. The graphs do display but they are not updating. I *think* that ncpd is responsible for this and have check that it is running, restarted it anyway, what else might it be?

I'm using Nagios XI 2012R2.8c

Code: Select all

service npcd status
NPCD running (pid 18705).

Code: Select all

ps -aef | grep npcd
root     15515 22034  0 08:19 pts/0    00:00:00 grep npcd
nagios   18705     1  0 Apr22 ?        00:00:00 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg

Code: Select all

 ls /usr/local/nagios/var/spool/checkresults/ | wc -l
55

Code: Select all

ls /usr/local/nagios/var/spool/perfdata/ | wc -l
1

Code: Select all

ls /usr/local/nagios/var/spool/xidpe/ | wc -l
568792
cat /usr/local/nagios/etc/nagios.cfg | grep process_performance_data
process_performance_data=1

Code: Select all

 tail -25 /usr/local/nagios/var/npcd.log
[04-23-2015 07:53:12] NPCD: WARN: MAX load reached: load 11.210000/10.000000 at i=0
[04-23-2015 07:54:57] NPCD: WARN: MAX load reached: load 14.190000/10.000000 at i=0
[04-23-2015 07:55:12] NPCD: WARN: MAX load reached: load 11.120000/10.000000 at i=1
[04-23-2015 07:55:57] NPCD: WARN: MAX load reached: load 16.940000/10.000000 at i=0
[04-23-2015 07:56:12] NPCD: WARN: MAX load reached: load 13.250000/10.000000 at i=1
[04-23-2015 07:56:27] NPCD: WARN: MAX load reached: load 10.320000/10.000000 at i=1
[04-23-2015 07:58:12] NPCD: WARN: MAX load reached: load 10.290000/10.000000 at i=0
[04-23-2015 07:59:57] NPCD: WARN: MAX load reached: load 12.720000/10.000000 at i=0
[04-23-2015 08:00:57] NPCD: WARN: MAX load reached: load 17.140000/10.000000 at i=0
[04-23-2015 08:01:12] NPCD: WARN: MAX load reached: load 13.410000/10.000000 at i=1
[04-23-2015 08:01:27] NPCD: WARN: MAX load reached: load 10.440000/10.000000 at i=1
[04-23-2015 08:03:12] NPCD: WARN: MAX load reached: load 10.630000/10.000000 at i=0
[04-23-2015 08:04:57] NPCD: WARN: MAX load reached: load 13.740000/10.000000 at i=0
[04-23-2015 08:05:12] NPCD: WARN: MAX load reached: load 10.770000/10.000000 at i=1
[04-23-2015 08:05:57] NPCD: WARN: MAX load reached: load 19.210000/10.000000 at i=0
[04-23-2015 08:06:12] NPCD: WARN: MAX load reached: load 15.090000/10.000000 at i=1
[04-23-2015 08:06:27] NPCD: WARN: MAX load reached: load 11.740000/10.000000 at i=1
[04-23-2015 08:09:57] NPCD: WARN: MAX load reached: load 12.150000/10.000000 at i=0
[04-23-2015 08:10:57] NPCD: WARN: MAX load reached: load 15.530000/10.000000 at i=0
[04-23-2015 08:11:12] NPCD: WARN: MAX load reached: load 12.090000/10.000000 at i=1
[04-23-2015 08:14:57] NPCD: WARN: MAX load reached: load 12.250000/10.000000 at i=0
[04-23-2015 08:15:57] NPCD: WARN: MAX load reached: load 15.310000/10.000000 at i=0
[04-23-2015 08:16:12] NPCD: WARN: MAX load reached: load 11.990000/10.000000 at i=1
[04-23-2015 08:20:57] NPCD: WARN: MAX load reached: load 14.350000/10.000000 at i=0
[04-23-2015 08:21:12] NPCD: WARN: MAX load reached: load 11.240000/10.000000 at i=1

Code: Select all

[root@amcmnagios01 ~]# tail -f /usr/local/nagios/var/perfdata.log
2015-02-12 12:03:17 [19331] [0] *** TIMEOUT: Please check your npcd.cfg
2015-02-12 12:03:17 [19331] [0] *** TIMEOUT: Could not delete /usr/local/nagios/var/spool/perfdata//1423742552.perfdata.host-PID-19331:No such file or directory
2015-02-12 12:03:17 [19331] [0] *** Timeout while processing Host: "" Service: ""
2015-02-12 12:03:17 [19331] [0] *** process_perfdata.pl terminated on signal ALRM
2015-02-12 12:03:17 [19332] [0] *** TIMEOUT: Timeout after 5 secs. ***
2015-02-12 12:03:17 [19332] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-02-12 12:03:17 [19332] [0] *** TIMEOUT: Please check your npcd.cfg
2015-02-12 12:03:17 [19332] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1423742552.perfdata.service-PID-19332 deleted
2015-02-12 12:03:17 [19332] [0] *** Timeout while processing Host: "amcmhddc02" Service: "CPU_Usage"
2015-02-12 12:03:17 [19332] [0] *** process_perfdata.pl terminated on signal ALRM
I checked npcd.cfg, it has not been modified since Jan 2014. The graphs stopped updating at the end of Feb 2015.

Anyone got any ideas what might be causing this?

Thanks

Matt


Update

I just found this on the forum which seemed appropriate

Code: Select all

service npcd stop

cd /usr/local/nagios/var/spool/xidpe

find . -type f -delete

service npcd start
It seems to be collecting data and graphing with it now, i will keep an eye on it but needless to say, i have had to delete all of my historic perf data which is a shame but never mind.

Re: All Host Graphs stopped several weeks ago.

Posted: Thu Apr 23, 2015 9:08 am
by nottheadmin
I believe that this has now fixed my issue. I hope that it helps someone else in future.

Re: All Host Graphs stopped several weeks ago.

Posted: Thu Apr 23, 2015 9:33 am
by tmcdonald
Sorry for not getting to you sooner - we start right around the time your posted your last post. Glad it worked out for you though!

I'll be closing this thread now, but feel free to open another if you need anything in the future!