Page 1 of 2

Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:24 pm
by aleksl
Hi, our nagios suddenly stopped collecting historical performance data into the graphs as of last week. I checked the commit log to the config and nothing was done on that day. we did patch glibc on the RH server running nagios that morning and rebooted.

alerting is still active, and we get alerts on services monitors and counters, but within nagios graph explorer, nothing is new since that day.

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:33 pm
by aleksl
[root@10-165-2-24 OCASHVRGWA01P]# tail -25 ../../../var/perfdata.log
2015-01-21 14:47:23 [1982] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.service-PID-5988 deleted
2015-01-21 15:03:00 [5988] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "Counter__VM_CPU_Time_Stolen_Total"
2015-01-21 15:03:00 [5988] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.host-PID-5987 deleted
2015-01-21 15:03:00 [5987] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "_HOST_"
2015-01-21 15:03:00 [5987] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000756.perfdata.service-PID-25828 deleted
2015-01-23 03:12:58 [25828] [0] *** Timeout while processing Host: "OCASHVRGBPM01P" Service: "Counter__SmarterMail_Outgoing_Messages"
2015-01-23 03:12:58 [25828] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000878.perfdata.service-PID-30407 deleted
2015-01-23 03:15:04 [30407] [0] *** Timeout while processing Host: "OCASHVRGDB02S" Service: "Drive_D__Disk_Usage"
2015-01-23 03:15:04 [30407] [0] *** process_perfdata.pl terminated on signal ALRM

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:34 pm
by aleksl
[root@10-165-2-24 var]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 var]# ls

--------

[root@10-165-2-24 var]# tail npcd.log
[01-29-2015 12:18:00] NPCD: Please have a look at 'npcd -V' to get license information
[01-29-2015 12:18:00] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:18:22] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:18:51] NPCD: npcd Daemon (0.4.14) started with PID=1686
[02-03-2015 14:18:51] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:18:51] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:31:16] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:31:16] NPCD: npcd Daemon (0.4.14) started with PID=23550
[02-03-2015 14:31:16] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:31:16] NPCD: HINT: load_threshold is enabled - ('10.000000')




---

[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:36 pm
by scottwilkerson
Lets start by running

Code: Select all

service npcd restart
Also, lets run the following and post back

Code: Select all

df -h
ls /var/nagiosramdisk/spool/perfdata/|wc -l
ls /var/nagiosramdisk/spool/xidpe/|wc -l

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:39 pm
by aleksl
I also posted some outputs above.

Code: Select all

[root@10-165-2-24 var]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/volGroup00-rootVol00
                       28G  6.4G   20G  25% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/sda1             190M  131M   50M  73% /boot
tmpfs                 2.0G  2.2M  2.0G   1% /var/nagiosramdisk
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/perfdata/ | wc -l
0
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/xidpe/ | wc -l
0
[root@10-165-2-24 var]#

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:41 pm
by scottwilkerson
Can you run the following to see if the dates are current

Code: Select all

ls -l  /usr/local/nagios/share/perfdata/*/*.rrd
And also

Code: Select all

ls -ld  /usr/local/nagios/share/
ls -ld  /usr/local/nagios/share/perfdata

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 2:46 pm
by aleksl
for the first command, looks like last update was Jan 29th for most of the rrd's (few are defunct)

Code: Select all

-rwxrwxr-x 1 nagios nagios  384952 Jan 19  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 23  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Exceptions-Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Current.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Queued.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Request_Wait_Time.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Disk_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Current_Worker_Processes.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_VSuite_App_Pool_Uptime.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Processes_Created.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Process_Failures.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Feb 25  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages-Second.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Network_Bytes_Received_Sec.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Processor_Queue_Length.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Session_SQL_Server_Connections_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_CPU_Time_Stolen_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 23  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_Processor_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Feb 25  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes-Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jul  2  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter_Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_VSuite_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__1_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__5_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 18  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_C__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_D__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_E__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 11:59 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Event_Logs__App_Pool_Crashes.rrd
-rwxrwxr-x 1 nagios nagios  768224 Jun  5  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/_HOST_.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__All.rrd
-rwxrwxr-x 1 nagios nagios  768232 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__Physical.rrd
-rwxrwxr-x 1 nagios nagios  384960 Feb 22  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios  768232 Feb 22  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Physical_Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios  768224 Jun  5  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Ping.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Uptime.rrd
and for the latter

Code: Select all

[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/
drwxrwxr-x 14 nagios nagios 4096 Jun 27  2014 /usr/local/nagios/share/
[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/perfdata/
drwxrwxr-x 70 nagios nagios 4096 Jan 22 16:05 /usr/local/nagios/share/perfdata/

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 3:11 pm
by lmiltchev
Let's try restarting nagios and npcd, and see what is the number of files in the "spool" sub-directories at the moment. Run the following commands and show us the output:

Code: Select all

service nagios restart
service nagios status
service npcd restart
service npcd status
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 3:15 pm
by aleksl
Done, see below.

Code: Select all

[root@10-165-2-24 spool]# pwd
/usr/local/nagios/var/spool
[root@10-165-2-24 spool]# service nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@10-165-2-24 spool]# service nagios status
nagios (pid 32599) is running...
[root@10-165-2-24 spool]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 spool]# service npcd status
NPCD running (pid 710).
[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
[root@10-165-2-24 spool]#

Re: Nagios suddenly stopped displaying historical data

Posted: Tue Feb 03, 2015 3:16 pm
by scottwilkerson
This is strange, can you open a ticket with [email protected] and include your profile from Admin -> System profile and reference this thread.

It may be possible that the performance data entries in your nagios.cfg were modified.

as well as I have noticed some references to having setup a ram disk and others that lead me to believe it may not be completely setup