This support forum board is for support questions relating to
Nagios XI , our flagship commercial network monitoring solution.
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Thu Jun 16, 2016 1:37 am
So we had a long bout with this before and everything was working ok. But now, I cannot seem to settle XI where the perf graph stays working. Usually restarts helps with freeing memory or we run into space issue that might cause it, but this week, I can't seem to be able to get this to settle.
Here's what it looks like last 7days
7days.JPG
30 days
30days.JPG
So the gaps is where the graphs are stuck and I reboot XI.
Code: Select all
[root@nagxi01 ~]# free -m
total used free shared buffers cached
Mem: 19990 19008 981 31 134 16752
-/+ buffers/cache: 2121 17868
Swap: 2015 9 2006
Code: Select all
[root@nagxi01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
286G 183G 89G 68% /
tmpfs 9.8G 0 9.8G 0% /dev/shm
/dev/sda1 477M 95M 357M 21% /boot
We have a pretty a large number of hosts and checks. Our previous threads have some history on this.
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579 Joined: Mon Oct 05, 2015 11:45 am
Post
by rkennedy » Thu Jun 16, 2016 9:22 am
Could you download a profile when this happens once again? I'm wondering if you're hitting a high load, which is causing issues with NPCD. We would need to see it at the time of happening, though.
Are you using a ramdisk currently?
Former Nagios Employee
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Thu Jun 16, 2016 10:34 am
It is no longer producing any graphs even after reboots. We are not using RAMDISK right now. The profile is attached.
You do not have the required permissions to view the files attached to this post.
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Thu Jun 16, 2016 11:01 am
Found some TIMEOUT errors in the logs
Code: Select all
==> /usr/local/nagios/var/perfdata.log <==
2016-06-14 08:08:59 [3420] [2] No Custom Template found for check_xi_service_snmp_linux_storage (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_snmp_linux_storage.cfg)
2016-06-14 08:08:59 [3420] [2] Template is check_xi_service_snmp_linux_storage.php
2016-06-14 08:08:59 [3420] [2] data2rrd called
2016-06-14 08:08:59 [3420] [2] RRDs::update /usr/local/nagios/share/perfdata/esbcprdmsg07/DiskAll.rrd 1465916621:1783
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Timeout after 45 secs. ***
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: Please check your npcd.cfg
2016-06-14 08:08:59 [3420] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1465916622.perfdata.service-PID-3420 deleted
2016-06-14 08:08:59 [3420] [0] *** Timeout while processing Host: "esbcprdmsg07" Service: "DiskAll"
2016-06-14 08:08:59 [3420] [0] *** process_perfdata.pl terminated on signal ALRM
Code: Select all
[06-16-2016 09:12:44] NPCD: Processing file '1466093476.perfdata.service'
[06-16-2016 09:13:29] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093460.perfdata.service'
[06-16-2016 09:13:30] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093446.perfdata.service'
[06-16-2016 09:13:30] NPCD: ERROR: Executed command exits with return code '7'
[06-16-2016 09:13:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093476.perfdata.service'
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['3']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['2']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['1']
[06-16-2016 09:13:32] NPCD: DEBUG: Will wait for th['0']
[06-16-2016 09:13:32] NPCD: DEBUG: load 14.010000/20.000000
[06-16-2016 09:13:32] NPCD: ThreadCounter 0/5 File is 1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: Regular File: 1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: A thread was started on thread_counter = 0
[06-16-2016 09:13:32] NPCD: Processing file 1466093490.perfdata.host with ID 140307386648320 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093490.perfdata.host
[06-16-2016 09:13:32] NPCD: DEBUG: load 14.010000/20.000000
[06-16-2016 09:13:32] NPCD: ThreadCounter 1/5 File is 1466093490.perfdata.service
[06-16-2016 09:13:32] NPCD: Processing file '1466093490.perfdata.host'
[06-16-2016 09:13:32] NPCD: Regular File: 1466093490.perfdata.service
[06-16-2016 09:13:32] NPCD: A thread was started on thread_counter = 1
[06-16-2016 09:13:32] NPCD: Processing file 1466093490.perfdata.service with ID 140307376158464 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1466093490.perfdata.service
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Thu Jun 16, 2016 1:27 pm
Update: Looks like a Database Repair and a reboot helped. It is now showing graphs and I will keep an eye on it. Are there any recommendations to make sure this stays working?
lmiltchev
Bugs find me
Posts: 13589 Joined: Mon May 23, 2011 12:15 pm
Post
by lmiltchev » Thu Jun 16, 2016 3:49 pm
You can increase the timeout in the "/usr/local/nagios/etc/pnp/process_perfdata.cfg", and keep an eye on the load on the system, making sure it doesn't exceed the "load_threshold" value (that is set in the "/usr/local/nagios/etc/pnp/npcd.cfg"). For general troubleshooting steps, please review our KB article on the topic here:
https://support.nagios.com/kb/article.php?id=9
Be sure to check out our
Knowledgebase for helpful articles and solutions!
lmiltchev
Bugs find me
Posts: 13589 Joined: Mon May 23, 2011 12:15 pm
Post
by lmiltchev » Fri Jun 17, 2016 9:49 am
Thanks, spurrellian!
Be sure to check out our
Knowledgebase for helpful articles and solutions!