Page 1 of 1

Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 1:37 am
by CFT6Server
So we had a long bout with this before and everything was working ok. But now, I cannot seem to settle XI where the perf graph stays working. Usually restarts helps with freeing memory or we run into space issue that might cause it, but this week, I can't seem to be able to get this to settle.

Here's what it looks like last 7days
7days.JPG
30 days
30days.JPG
So the gaps is where the graphs are stuck and I reboot XI.

Code: Select all

[root@nagxi01 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         19990      19008        981         31        134      16752
-/+ buffers/cache:       2121      17868
Swap:         2015          9       2006

Code: Select all

[root@nagxi01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      286G  183G   89G  68% /
tmpfs                 9.8G     0  9.8G   0% /dev/shm
/dev/sda1             477M   95M  357M  21% /boot
We have a pretty a large number of hosts and checks. Our previous threads have some history on this.

Re: Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 9:22 am
by rkennedy
Could you download a profile when this happens once again? I'm wondering if you're hitting a high load, which is causing issues with NPCD. We would need to see it at the time of happening, though.

Are you using a ramdisk currently?

Re: Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 10:34 am
by CFT6Server
It is no longer producing any graphs even after reboots. We are not using RAMDISK right now. The profile is attached.

Re: Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 11:01 am
by CFT6Server
Found some TIMEOUT errors in the logs

Code: Select all

==> /usr/local/nagios/var/perfdata.log <==
2016-06-14 08:08:59 [3420] [2] No Custom Template found for check_xi_service_snmp_linux_storage (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_snmp_linux_storage.cfg)
2016-06-14 08:08:59 [3420] [2] Template is check_xi_service_snmp_linux_storage.php
2016-06-14 08:08:59 [3420] [2] data2rrd called
2016-06-14 08:08:59 [3420] [2] RRDs::update /usr/local/nagios/share/perfdata/esbcprdmsg07/DiskAll.rrd 1465916621:1783
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Timeout after 45 secs. ***
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Please check your npcd.cfg
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1465916622.perfdata.service-PID-3420 deleted
2016-06-14 08:08:59 [3420] [0] *** Timeout while processing Host: "esbcprdmsg07" Service: "DiskAll"
2016-06-14 08:08:59 [3420] [0] *** process_perfdata.pl terminated on signal ALRM

Code: Select all

[06-16-2016 09:12:44] NPCD: Processing file '1466093476.perfdata.service'
[06-16-2016 09:13:29] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093460.perfdata.service'
[06-16-2016 09:13:30] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093446.perfdata.service'
[06-16-2016 09:13:30] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093476.perfdata.service'
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['3']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['2']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['1']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['0']
[06-16-2016 09:13:32] NPCD: DEBUG: load 14.010000/20.000000
[06-16-2016 09:13:32] NPCD: ThreadCounter 0/5 File is 1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: Regular File: 1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: A thread was started on thread_counter = 0
[06-16-2016 09:13:32] NPCD: Processing file 1466093490.perfdata.host with ID 140307386648320 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: DEBUG: load 14.010000/20.000000
[06-16-2016 09:13:32] NPCD: ThreadCounter 1/5 File is 1466093490.perfdata.service
[06-16-2016 09:13:32] NPCD: Processing file '1466093490.perfdata.host'
[06-16-2016 09:13:32] NPCD: Regular File: 1466093490.perfdata.service
[06-16-2016 09:13:32] NPCD: A thread was started on thread_counter = 1
[06-16-2016 09:13:32] NPCD: Processing file 1466093490.perfdata.service with ID 140307376158464 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093490.perfdata.service

Re: Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 1:27 pm
by CFT6Server
Update: Looks like a Database Repair and a reboot helped. It is now showing graphs and I will keep an eye on it. Are there any recommendations to make sure this stays working?

Re: Performance Graphs keep disappearing

Posted: Thu Jun 16, 2016 3:49 pm
by lmiltchev
You can increase the timeout in the "/usr/local/nagios/etc/pnp/process_perfdata.cfg", and keep an eye on the load on the system, making sure it doesn't exceed the "load_threshold" value (that is set in the "/usr/local/nagios/etc/pnp/npcd.cfg"). For general troubleshooting steps, please review our KB article on the topic here:

https://support.nagios.com/kb/article.php?id=9

Re: Performance Graphs keep disappearing

Posted: Fri Jun 17, 2016 2:31 am
by spurrellian
I had a similar issue with my environment and setup rrdcached within XI

Document located here, may be worth a try?

https://exchange.nagios.org/directory/D ... XI/details

Re: Performance Graphs keep disappearing

Posted: Fri Jun 17, 2016 9:49 am
by lmiltchev
Thanks, spurrellian!