Page 1 of 1

Performance Graphs stopped working

Posted: Mon Oct 27, 2014 3:28 pm
by almonitoradmin
Hi,

We are having trouble with our performance graphing. Data for the graphs quit working about 6 days ago. The graphs show up, and they display history data, but the data just ends there, at around 6 days ago.

I did notice this in the process list on the machine:

nagios 7365 84.5 0.0 122624 2232 ? R Oct20 8724:20 /usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1413819918.perfdata.service


Is that hung? Should it be killed perhaps?

Here are some more details:

Code: Select all

ll /usr/local/nagios/share/perfdata/

Code: Select all

total 308
drwxrwxr-x 2 nagios nagios 4096 May 27 08:37 10.47.157.6
drwxrwxr-x 2 nagios nagios 4096 May 27 10:02 10.47.157.7
drwxrwxr-x 2 nagios nagios 4096 Jun 19 12:05 10.87.154.69
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:44 172-31-1-0--10-164-128-0--tunnel
drwxrwxr-x 2 nagios nagios 4096 Jul 15 10:20 172.31.1.14
drwxrwxr-x 2 nagios nagios 4096 May 27 09:50 Awards-Catalog
drwxrwxr-x 2 nagios nagios 4096 Jun  2 14:12 Awards-Catalog-1
drwxrwxr-x 2 nagios nagios 4096 Jun  2 14:14 Awards-Catalog-2
drwxrwxr-x 2 nagios nagios 4096 Jul  7 12:01 kgpprodweb32
drwxrwxr-x 2 nagios nagios 4096 Jul 29 09:22 kgpprodweb47
drwxrwxr-x 2 nagios nagios 4096 Aug  4 13:38 kgprodweb30
drwxrwxr-x 2 nagios nagios 4096 May 29 09:30 kgprodweb31
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:45 kgprodweb46
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:45 kgprodweb47
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:45 kgprodweb48
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:45 kgprodwww01
drwxrwxr-x 2 nagios nagios 4096 Oct 20 10:45 kgprodwww02
drwxrwxr-x 2 nagios nagios 4096 Sep  3 11:29 VPN_Tunnel

Code: Select all

tail -25 /usr/local/nagios/var/perfdata.log

Code: Select all

[root@usawepvl011 xidpe]# tail -25 /usr/local/nagios/var/perfdata.log
2014-04-29 15:20:14 [21316] [0] *** TIMEOUT: Timeout after 5 secs. ***
2014-04-29 15:20:14 [21316] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-29 15:20:14 [21316] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-29 15:20:14 [21316] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398802784.perfdata.host-PID-21316 deleted
2014-04-29 15:20:14 [21316] [0] *** Timeout while processing Host: "lglproddb01" Service: "_HOST_"
2014-04-29 15:20:14 [21316] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-29 15:20:14 [21315] [0] *** TIMEOUT: Timeout after 5 secs. ***
2014-04-29 15:20:14 [21315] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-29 15:20:14 [21315] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-29 15:20:14 [21315] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398802784.perfdata.service-PID-21315 deleted
2014-04-29 15:20:14 [21315] [0] *** Timeout while processing Host: "lglprodweb01" Service: "_dev_xvde_Disk_Usage"
2014-04-29 15:20:14 [21315] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-29 15:23:37 [21957] [0] *** TIMEOUT: Timeout after 5 secs. ***
2014-04-29 15:23:37 [21957] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-29 15:23:37 [21957] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-29 15:23:37 [21957] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398802934.perfdata.host-PID-21957 deleted
2014-04-29 15:23:37 [21957] [0] *** Timeout while processing Host: "" Service: ""
2014-04-29 15:23:37 [21957] [0] *** process_perfdata.pl terminated on signal ALRM

Code: Select all

tail -25 /usr/local/nagios/var/npcd.log

Code: Select all

[09-03-2014 14:32:12] NPCD: WARN: MAX load reached: load 12.830000/10.000000 at i=1
[09-03-2014 14:32:27] NPCD: WARN: MAX load reached: load 11.890000/10.000000 at i=1
[09-03-2014 14:33:13] NPCD: WARN: MAX load reached: load 10.850000/10.000000 at i=37
[09-09-2014 22:53:12] NPCD: WARN: MAX load reached: load 10.590000/10.000000 at i=0
[09-09-2014 22:53:27] NPCD: WARN: MAX load reached: load 13.550000/10.000000 at i=1
[09-09-2014 22:53:42] NPCD: WARN: MAX load reached: load 14.850000/10.000000 at i=1
[09-09-2014 22:53:57] NPCD: WARN: MAX load reached: load 13.720000/10.000000 at i=1
[09-09-2014 22:54:12] NPCD: WARN: MAX load reached: load 15.220000/10.000000 at i=1
[09-09-2014 22:54:27] NPCD: WARN: MAX load reached: load 16.860000/10.000000 at i=1
[09-09-2014 22:54:42] NPCD: WARN: MAX load reached: load 16.480000/10.000000 at i=1
[09-09-2014 22:54:57] NPCD: WARN: MAX load reached: load 14.660000/10.000000 at i=1
[09-09-2014 22:55:12] NPCD: WARN: MAX load reached: load 16.660000/10.000000 at i=1
[09-09-2014 22:55:27] NPCD: WARN: MAX load reached: load 17.560000/10.000000 at i=1
[09-09-2014 22:55:42] NPCD: WARN: MAX load reached: load 17.410000/10.000000 at i=1
[09-09-2014 22:55:57] NPCD: WARN: MAX load reached: load 15.900000/10.000000 at i=1
[09-09-2014 22:56:12] NPCD: WARN: MAX load reached: load 17.900000/10.000000 at i=1
[09-09-2014 22:56:27] NPCD: WARN: MAX load reached: load 16.240000/10.000000 at i=1
[09-09-2014 22:56:42] NPCD: WARN: MAX load reached: load 16.470000/10.000000 at i=1
[09-09-2014 22:56:57] NPCD: WARN: MAX load reached: load 13.960000/10.000000 at i=1
[09-09-2014 22:57:12] NPCD: WARN: MAX load reached: load 12.690000/10.000000 at i=1
[09-09-2014 22:57:27] NPCD: WARN: MAX load reached: load 11.100000/10.000000 at i=1
[09-27-2014 01:53:30] NPCD: Caught Termination Signal - Hasta la vista... baby
[09-27-2014 02:09:42] NPCD: npcd Daemon (0.4.14) started with PID=1279
[09-27-2014 02:09:42] NPCD: Please have a look at 'npcd -V' to get license information
[09-27-2014 02:09:42] NPCD: HINT: load_threshold is enabled - ('10.000000')




We are running with:

Nagios XI 2014r1.0
Centos 6.5 64 bit
Manual XI install
Running SSL

Code: Select all

System:
Nagios XI Version : 2014R1.0
usawepvl011.ficticious.com 2.6.32-431.11.2.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
Server Name: nagios.ficticious.com
Server Address: 10.164.130.252
Server Port: 443
Date/Time
PHP Timezone: America/Chicago
PHP Time: Mon, 27 Oct 2014 15:14:37 -0500
System Time: Mon, 27 Oct 2014 15:14:37 -0500
Nagios XI Data
License ends in: MTSVNN

nagios (pid 23514) is running...
NPCD running (pid 1279).
ndo2db (pid 1337) is running...
CPU Load 15: 7.39
Total Hosts: 36
Total Services: 473
Function 'get_base_uri' returns: https://nagios.ficticious.com/nagiosxi/
Function 'get_base_url' returns: https://nagios.ficticious.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: https://nagios.ficticious.com/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1 

PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.014 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=3 ttl=64 time=0.018 ms

--- localhost.localdomain ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2006ms
rtt min/avg/max/mdev = 0.014/0.016/0.018/0.005 ms
Test wget To locahost
WGET From URL: http://localhost/nagiosql/index.php
Running:

/usr/bin/wget http://localhost/nagiosql/index.php 

--2014-10-27 15:14:40-- http://localhost/nagiosql/index.php
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5259 (5.1K) [text/html]
Saving to: "/usr/local/nagiosxi/tmp/nagiosql_index.tmp"

0K ..... 100% 345M=0s

2014-10-27 15:14:40 (345 MB/s) - "/usr/local/nagiosxi/tmp/nagiosql_index.tmp" saved [5259/5259]

Re: Performance Graphs stopped working

Posted: Mon Oct 27, 2014 4:41 pm
by abrist
almonitoradmin wrote:nagios 7365 84.5 0.0 122624 2232 ? R Oct20 8724:20 /usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1413819918.perfdata.service
It has been running since the 20th, so kill it:
Lets check to see how big the spool files have become:

Code: Select all

ls -lah /usr/local/nagios/var/host-perfdata
ls -lah /usr/local/nagios/var/service-perfdata
And lets check the count of the spooler folders:

Code: Select all

ls /usr/local/nagios/var/spool/xidpe/ | wc -l
ls /usr/local/nagios/var/spool/perfdata/ | wc -l
ls /usr/local/nagios/var/spool/checkresults/ | wc -l
Is npcd running?

Code: Select all

service npcd status
ps -aef | grep npcd

Re: Performance Graphs stopped working

Posted: Mon Oct 27, 2014 8:48 pm
by almonitoradmin

Code: Select all

ls -lah /usr/local/nagios/var/host-perfdata
-rw-r--r-- 1 nagios users 933 Oct 27 20:41 /usr/local/nagios/var/host-perfdata

Code: Select all

ls -lah /usr/local/nagios/var/service-perfdata
-rw-r--r-- 1 nagios users 7.1K Oct 27 20:42 /usr/local/nagios/var/service-perfdata

Code: Select all

ls /usr/local/nagios/var/spool/xidpe/ | wc -l
2


Code: Select all

ls /usr/local/nagios/var/spool/perfdata/ | wc -l
82997

Code: Select all

ls /usr/local/nagios/var/spool/checkresults/ | wc -l
0

Code: Select all

service npcd status
NPCD running (pid 1279).


Code: Select all

ps -aef | grep npcd
nagios 1279 1 0 Sep27 ? 00:03:17 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg

RESOLVED: Re: Performance Graphs stopped working

Posted: Tue Oct 28, 2014 9:26 am
by almonitoradmin
I checked the graphs this morning, and they are all working.

Thanks for the help!

- Jamie

Re: Performance Graphs stopped working

Posted: Tue Oct 28, 2014 11:40 am
by cmerchant
Glad to see its working for you. I'll go ahead and lock the thread. Thanks.