Nagios suddenly stopped displaying historical data
Nagios suddenly stopped displaying historical data
Hi, our nagios suddenly stopped collecting historical performance data into the graphs as of last week. I checked the commit log to the config and nothing was done on that day. we did patch glibc on the RH server running nagios that morning and rebooted.
alerting is still active, and we get alerts on services monitors and counters, but within nagios graph explorer, nothing is new since that day.
alerting is still active, and we get alerts on services monitors and counters, but within nagios graph explorer, nothing is new since that day.
Re: Nagios suddenly stopped displaying historical data
[root@10-165-2-24 OCASHVRGWA01P]# tail -25 ../../../var/perfdata.log
2015-01-21 14:47:23 [1982] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.service-PID-5988 deleted
2015-01-21 15:03:00 [5988] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "Counter__VM_CPU_Time_Stolen_Total"
2015-01-21 15:03:00 [5988] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.host-PID-5987 deleted
2015-01-21 15:03:00 [5987] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "_HOST_"
2015-01-21 15:03:00 [5987] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000756.perfdata.service-PID-25828 deleted
2015-01-23 03:12:58 [25828] [0] *** Timeout while processing Host: "OCASHVRGBPM01P" Service: "Counter__SmarterMail_Outgoing_Messages"
2015-01-23 03:12:58 [25828] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000878.perfdata.service-PID-30407 deleted
2015-01-23 03:15:04 [30407] [0] *** Timeout while processing Host: "OCASHVRGDB02S" Service: "Drive_D__Disk_Usage"
2015-01-23 03:15:04 [30407] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 14:47:23 [1982] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.service-PID-5988 deleted
2015-01-21 15:03:00 [5988] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "Counter__VM_CPU_Time_Stolen_Total"
2015-01-21 15:03:00 [5988] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.host-PID-5987 deleted
2015-01-21 15:03:00 [5987] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "_HOST_"
2015-01-21 15:03:00 [5987] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000756.perfdata.service-PID-25828 deleted
2015-01-23 03:12:58 [25828] [0] *** Timeout while processing Host: "OCASHVRGBPM01P" Service: "Counter__SmarterMail_Outgoing_Messages"
2015-01-23 03:12:58 [25828] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000878.perfdata.service-PID-30407 deleted
2015-01-23 03:15:04 [30407] [0] *** Timeout while processing Host: "OCASHVRGDB02S" Service: "Drive_D__Disk_Usage"
2015-01-23 03:15:04 [30407] [0] *** process_perfdata.pl terminated on signal ALRM
Re: Nagios suddenly stopped displaying historical data
[root@10-165-2-24 var]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 var]# ls
--------
[root@10-165-2-24 var]# tail npcd.log
[01-29-2015 12:18:00] NPCD: Please have a look at 'npcd -V' to get license information
[01-29-2015 12:18:00] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:18:22] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:18:51] NPCD: npcd Daemon (0.4.14) started with PID=1686
[02-03-2015 14:18:51] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:18:51] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:31:16] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:31:16] NPCD: npcd Daemon (0.4.14) started with PID=23550
[02-03-2015 14:31:16] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:31:16] NPCD: HINT: load_threshold is enabled - ('10.000000')
---
[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
NPCD Stopped.
NPCD started.
[root@10-165-2-24 var]# ls
--------
[root@10-165-2-24 var]# tail npcd.log
[01-29-2015 12:18:00] NPCD: Please have a look at 'npcd -V' to get license information
[01-29-2015 12:18:00] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:18:22] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:18:51] NPCD: npcd Daemon (0.4.14) started with PID=1686
[02-03-2015 14:18:51] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:18:51] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:31:16] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:31:16] NPCD: npcd Daemon (0.4.14) started with PID=23550
[02-03-2015 14:31:16] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:31:16] NPCD: HINT: load_threshold is enabled - ('10.000000')
---
[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios suddenly stopped displaying historical data
Lets start by running
Also, lets run the following and post back
Code: Select all
service npcd restartCode: Select all
df -h
ls /var/nagiosramdisk/spool/perfdata/|wc -l
ls /var/nagiosramdisk/spool/xidpe/|wc -lRe: Nagios suddenly stopped displaying historical data
I also posted some outputs above.
Code: Select all
[root@10-165-2-24 var]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/volGroup00-rootVol00
28G 6.4G 20G 25% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 190M 131M 50M 73% /boot
tmpfs 2.0G 2.2M 2.0G 1% /var/nagiosramdisk
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/perfdata/ | wc -l
0
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/xidpe/ | wc -l
0
[root@10-165-2-24 var]#
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios suddenly stopped displaying historical data
Can you run the following to see if the dates are current
And also
Code: Select all
ls -l /usr/local/nagios/share/perfdata/*/*.rrd
Code: Select all
ls -ld /usr/local/nagios/share/
ls -ld /usr/local/nagios/share/perfdata
Re: Nagios suddenly stopped displaying historical data
for the first command, looks like last update was Jan 29th for most of the rrd's (few are defunct)
and for the latter
Code: Select all
-rwxrwxr-x 1 nagios nagios 384952 Jan 19 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 23 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Exceptions-Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Current.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Queued.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Request_Wait_Time.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Disk_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Current_Worker_Processes.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_VSuite_App_Pool_Uptime.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Processes_Created.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Process_Failures.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages_Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Feb 25 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages-Second.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Network_Bytes_Received_Sec.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Processor_Queue_Length.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Session_SQL_Server_Connections_Total.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_CPU_Time_Stolen_Total.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 23 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_Processor_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes_Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Feb 25 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes-Second.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jul 2 2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter_Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_VSuite_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__1_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__5_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 18 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_C__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_D__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_E__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jan 29 11:59 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Event_Logs__App_Pool_Crashes.rrd
-rwxrwxr-x 1 nagios nagios 768224 Jun 5 2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/_HOST_.rrd
-rwxrwxr-x 1 nagios nagios 384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__All.rrd
-rwxrwxr-x 1 nagios nagios 768232 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__Physical.rrd
-rwxrwxr-x 1 nagios nagios 384960 Feb 22 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios 768232 Feb 22 2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Physical_Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios 768224 Jun 5 2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Ping.rrd
-rwxrwxr-x 1 nagios nagios 384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Uptime.rrd
Code: Select all
[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/
drwxrwxr-x 14 nagios nagios 4096 Jun 27 2014 /usr/local/nagios/share/
[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/perfdata/
drwxrwxr-x 70 nagios nagios 4096 Jan 22 16:05 /usr/local/nagios/share/perfdata/Re: Nagios suddenly stopped displaying historical data
Let's try restarting nagios and npcd, and see what is the number of files in the "spool" sub-directories at the moment. Run the following commands and show us the output:
Code: Select all
service nagios restart
service nagios status
service npcd restart
service npcd status
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -lBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios suddenly stopped displaying historical data
Done, see below.
Code: Select all
[root@10-165-2-24 spool]# pwd
/usr/local/nagios/var/spool
[root@10-165-2-24 spool]# service nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@10-165-2-24 spool]# service nagios status
nagios (pid 32599) is running...
[root@10-165-2-24 spool]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 spool]# service npcd status
NPCD running (pid 710).
[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
[root@10-165-2-24 spool]#-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios suddenly stopped displaying historical data
This is strange, can you open a ticket with [email protected] and include your profile from Admin -> System profile and reference this thread.
It may be possible that the performance data entries in your nagios.cfg were modified.
as well as I have noticed some references to having setup a ram disk and others that lead me to believe it may not be completely setup
It may be possible that the performance data entries in your nagios.cfg were modified.
as well as I have noticed some references to having setup a ram disk and others that lead me to believe it may not be completely setup