After a reboot of the Nagios XI server we are no longer able to see any data on our performance graphs. It says "No data to display". I have tried several troubleshooting steps myself and found several issues:
- The system time was set to 11 may 2017. I used "ntpdate <ip address>" to sync the time with our domain controller.
Code: Select all
20 Apr 17:19:00 ntpdate[8244]: step time server x.x.x.x offset -1991168.028776 sec
Code: Select all
ll /var/lock/
drwxr-xr-x 2 root root 40 Apr 20 13:50 mrtg
Code: Select all
# tail -75 /usr/local/nagios/var/npcd.log
[03-26-2017 14:20:19] NPCD: Caught Termination Signal - Hasta la vista... baby
[03-26-2017 16:37:42] NPCD: npcd Daemon (0.4.14) started with PID=1006
[03-26-2017 16:37:42] NPCD: Please have a look at 'npcd -V' to get license information
[03-26-2017 16:37:42] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-18-2017 17:51:14] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-11-2017 18:58:01] NPCD: npcd Daemon (0.4.14) started with PID=1002
[05-11-2017 18:58:01] NPCD: Please have a look at 'npcd -V' to get license information
[05-11-2017 18:58:01] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:09:52] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 16:09:52] NPCD: npcd Daemon (0.4.14) started with PID=21932
[04-20-2017 16:09:52] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:09:52] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:42:01] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 16:42:01] NPCD: npcd Daemon (0.4.14) started with PID=61062
[04-20-2017 16:42:01] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:42:01] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:42:13] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 16:42:13] NPCD: npcd Daemon (0.4.14) started with PID=61244
[04-20-2017 16:42:13] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:42:13] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:46:26] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 16:46:26] NPCD: npcd Daemon (0.4.14) started with PID=1411
[04-20-2017 16:46:26] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:46:26] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:48:21] NPCD: npcd Daemon (0.4.14) started with PID=3348
[04-20-2017 16:48:21] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:48:21] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 16:59:37] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 16:59:37] NPCD: npcd Daemon (0.4.14) started with PID=18070
[04-20-2017 16:59:37] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 16:59:37] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-20-2017 17:01:45] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-20-2017 17:01:45] NPCD: npcd Daemon (0.4.14) started with PID=20506
[04-20-2017 17:01:45] NPCD: Please have a look at 'npcd -V' to get license information
[04-20-2017 17:01:45] NPCD: HINT: load_threshold is enabled - ('10.000000')
Code: Select all
# tail -f /usr/local/nagios/var/perfdata.log
2016-06-20 17:25:43 [59268] [0] *** TIMEOUT: Please check your npcd.cfg
2016-06-20 17:25:43 [59268] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1466436330.perfdata.host-PID-59268 deleted
2016-06-20 17:25:43 [59268] [0] *** Timeout while processing Host: "nloospr1.dbgroup.local" Service: "_HOST_"
2016-06-20 17:25:43 [59268] [0] *** process_perfdata.pl terminated on signal ALRM
2016-06-21 19:12:01 [20078] [0] *** TIMEOUT: Timeout after 5 secs. ***
2016-06-21 19:12:01 [20078] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-06-21 19:12:01 [20078] [0] *** TIMEOUT: Please check your npcd.cfg
2016-06-21 19:12:01 [20078] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1466529105.perfdata.service-PID-20078 deleted
2016-06-21 19:12:01 [20078] [0] *** Timeout while processing Host: "nloosvmm.dbgroup.local" Service: "F_Schijf"
2016-06-21 19:12:01 [20078] [0] *** process_perfdata.pl terminated on signal ALRM
- Then I found that the npcd service cannot read the PID file:
Code: Select all
# systemctl status npcd
● npcd.service - SYSV: Visit the Website at http://sourceforge.net/projects/pnp4nagios/
Loaded: loaded (/etc/rc.d/init.d/npcd; bad; vendor preset: disabled)
Active: active (running) since Thu 2017-04-20 16:48:21 CEST; 14s ago
Docs: man:systemd-sysv-generator(8)
Process: 1401 ExecStop=/etc/rc.d/init.d/npcd stop (code=exited, status=0/SUCCESS)
Process: 3345 ExecStart=/etc/rc.d/init.d/npcd start (code=exited, status=0/SUCCESS)
Main PID: 3348 (npcd)
CGroup: /system.slice/npcd.service
├─1411 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
└─3348 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
Apr 20 16:48:21 nlgrpngs.dbgroup.local systemd[1]: Starting SYSV: Visit the Website at http://sourceforge.net/projects/pnp4nagios/...
Apr 20 16:48:21 nlgrpngs.dbgroup.local npcd[3345]: NPCD started.
Apr 20 16:48:21 nlgrpngs.dbgroup.local systemd[1]: Failed to read PID from file /usr/local/nagiosxi/var/subsys/npcd.pid: Invalid argument
Apr 20 16:48:21 nlgrpngs.dbgroup.local systemd[1]: Started SYSV: Visit the Website at http://sourceforge.net/projects/pnp4nagios/.
Code: Select all
]# ll /usr/local/nagiosxi/var/subsys/
total 8
-rw-r--r-- 1 root root 0 Apr 20 15:42 nagios
-rw-r--r-- 1 nagios nagios 0 Dec 11 15:14 ndo2db
-rw-r--r-- 1 root root 4 Apr 20 16:48 npcd.pid
Any suggestions?
Kind regards,
Dennis Lans