Nagios suddenly stopped displaying historical data

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Nagios suddenly stopped displaying historical data

Post by aleksl »

Hi, our nagios suddenly stopped collecting historical performance data into the graphs as of last week. I checked the commit log to the config and nothing was done on that day. we did patch glibc on the RH server running nagios that morning and rebooted.

alerting is still active, and we get alerts on services monitors and counters, but within nagios graph explorer, nothing is new since that day.
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Re: Nagios suddenly stopped displaying historical data

Post by aleksl »

[root@10-165-2-24 OCASHVRGWA01P]# tail -25 ../../../var/perfdata.log
2015-01-21 14:47:23 [1982] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5988] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.service-PID-5988 deleted
2015-01-21 15:03:00 [5988] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "Counter__VM_CPU_Time_Stolen_Total"
2015-01-21 15:03:00 [5988] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-21 15:03:00 [5987] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1421870556.perfdata.host-PID-5987 deleted
2015-01-21 15:03:00 [5987] [0] *** Timeout while processing Host: "OCASHVRGWA02P" Service: "_HOST_"
2015-01-21 15:03:00 [5987] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:12:58 [25828] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000756.perfdata.service-PID-25828 deleted
2015-01-23 03:12:58 [25828] [0] *** Timeout while processing Host: "OCASHVRGBPM01P" Service: "Counter__SmarterMail_Outgoing_Messages"
2015-01-23 03:12:58 [25828] [0] *** process_perfdata.pl terminated on signal ALRM
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Timeout after 15 secs. ***
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: Please check your npcd.cfg
2015-01-23 03:15:04 [30407] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1422000878.perfdata.service-PID-30407 deleted
2015-01-23 03:15:04 [30407] [0] *** Timeout while processing Host: "OCASHVRGDB02S" Service: "Drive_D__Disk_Usage"
2015-01-23 03:15:04 [30407] [0] *** process_perfdata.pl terminated on signal ALRM
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Re: Nagios suddenly stopped displaying historical data

Post by aleksl »

[root@10-165-2-24 var]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 var]# ls

--------

[root@10-165-2-24 var]# tail npcd.log
[01-29-2015 12:18:00] NPCD: Please have a look at 'npcd -V' to get license information
[01-29-2015 12:18:00] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:18:22] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:18:51] NPCD: npcd Daemon (0.4.14) started with PID=1686
[02-03-2015 14:18:51] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:18:51] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-03-2015 14:31:16] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-03-2015 14:31:16] NPCD: npcd Daemon (0.4.14) started with PID=23550
[02-03-2015 14:31:16] NPCD: Please have a look at 'npcd -V' to get license information
[02-03-2015 14:31:16] NPCD: HINT: load_threshold is enabled - ('10.000000')




---

[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios suddenly stopped displaying historical data

Post by scottwilkerson »

Lets start by running

Code: Select all

service npcd restart
Also, lets run the following and post back

Code: Select all

df -h
ls /var/nagiosramdisk/spool/perfdata/|wc -l
ls /var/nagiosramdisk/spool/xidpe/|wc -l
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Re: Nagios suddenly stopped displaying historical data

Post by aleksl »

I also posted some outputs above.

Code: Select all

[root@10-165-2-24 var]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/volGroup00-rootVol00
                       28G  6.4G   20G  25% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/sda1             190M  131M   50M  73% /boot
tmpfs                 2.0G  2.2M  2.0G   1% /var/nagiosramdisk
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/perfdata/ | wc -l
0
[root@10-165-2-24 var]# ls /var/nagiosramdisk/spool/xidpe/ | wc -l
0
[root@10-165-2-24 var]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios suddenly stopped displaying historical data

Post by scottwilkerson »

Can you run the following to see if the dates are current

Code: Select all

ls -l  /usr/local/nagios/share/perfdata/*/*.rrd
And also

Code: Select all

ls -ld  /usr/local/nagios/share/
ls -ld  /usr/local/nagios/share/perfdata
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Re: Nagios suddenly stopped displaying historical data

Post by aleksl »

for the first command, looks like last update was Jan 29th for most of the rrd's (few are defunct)

Code: Select all

-rwxrwxr-x 1 nagios nagios  384952 Jan 19  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 23  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Exceptions-Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Current.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Queued.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Requests_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__ASP.NET_Request_Wait_Time.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Disk_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Current_Worker_Processes.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_VSuite_App_Pool_Uptime.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Processes_Created.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__IIS_Worker_Process_Failures.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Feb 25  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Memory_Pages-Second.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Network_Bytes_Received_Sec.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Processor_Queue_Length.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Session_SQL_Server_Connections_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_CPU_Time_Stolen_Total.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 23  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__VM_Processor_Time_Percentage_Total.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes_Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Feb 25  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Bytes-Second.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jul  2  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter_Web_Service_Total_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Counter__Web_Service_VSuite_Current_Connections.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__1_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage__5_Minute_Average.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 18  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/CPU_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_C__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_D__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:00 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Drive_E__Disk_Usage.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 11:59 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Event_Logs__App_Pool_Crashes.rrd
-rwxrwxr-x 1 nagios nagios  768224 Jun  5  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/_HOST_.rrd
-rwxrwxr-x 1 nagios nagios  384952 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__All.rrd
-rwxrwxr-x 1 nagios nagios  768232 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage__Physical.rrd
-rwxrwxr-x 1 nagios nagios  384960 Feb 22  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios  768232 Feb 22  2013 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Physical_Memory_Usage.rrd
-rwxrwxr-x 1 nagios nagios  768224 Jun  5  2014 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Ping.rrd
-rwxrwxr-x 1 nagios nagios  384960 Jan 29 12:17 /usr/local/nagios/share/perfdata/OCASHVRGWA01P/Uptime.rrd
and for the latter

Code: Select all

[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/
drwxrwxr-x 14 nagios nagios 4096 Jun 27  2014 /usr/local/nagios/share/
[root@10-165-2-24 var]# ls -ld /usr/local/nagios/share/perfdata/
drwxrwxr-x 70 nagios nagios 4096 Jan 22 16:05 /usr/local/nagios/share/perfdata/
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios suddenly stopped displaying historical data

Post by lmiltchev »

Let's try restarting nagios and npcd, and see what is the number of files in the "spool" sub-directories at the moment. Run the following commands and show us the output:

Code: Select all

service nagios restart
service nagios status
service npcd restart
service npcd status
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Be sure to check out our Knowledgebase for helpful articles and solutions!
aleksl
Posts: 11
Joined: Thu Sep 25, 2014 7:48 pm

Re: Nagios suddenly stopped displaying historical data

Post by aleksl »

Done, see below.

Code: Select all

[root@10-165-2-24 spool]# pwd
/usr/local/nagios/var/spool
[root@10-165-2-24 spool]# service nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@10-165-2-24 spool]# service nagios status
nagios (pid 32599) is running...
[root@10-165-2-24 spool]# service npcd restart
NPCD Stopped.
NPCD started.
[root@10-165-2-24 spool]# service npcd status
NPCD running (pid 710).
[root@10-165-2-24 spool]# ls xidpe/ | wc -l
74
[root@10-165-2-24 spool]# ls perfdata/ | wc -l
0
[root@10-165-2-24 spool]# ls checkresults/ | wc -l
158
[root@10-165-2-24 spool]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios suddenly stopped displaying historical data

Post by scottwilkerson »

This is strange, can you open a ticket with [email protected] and include your profile from Admin -> System profile and reference this thread.

It may be possible that the performance data entries in your nagios.cfg were modified.

as well as I have noticed some references to having setup a ram disk and others that lead me to believe it may not be completely setup
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked