we have a problem with our performance graphs. First after installing Nagios XI everything worked fine, also the performance graphs. Nagios XI is running on a bare metal server with RHEL 6.x. We are installing our Servers with kickstart and the default lifetime for the password is one year, foolishly even for the nagios user (i have missed that) . Now after a year the password for the nagios user gots invalid and many things in nagios stopped working. After fixing my mistake every works fine in nagios xi, but the performance graphs are no longer drawn.
In Nagios XI the "Performance Grapher" is shown as running, in "/usr/local/nagios/var" the files "host-perfdata" and "service-perfdata" are updated periodically. In "/usr/local/nagios/share/perfdata/.." no more data is written. Certainly i have rebooted the server, in the meantime an update for nagios xi appeard, the installation of the update was successful, but the performance graphs still not beeing drawn.
I tried the solutions shown in the Nagios XI FAQs already, but nothing changed.
Hope anybody can help me.
Best Regards
Reinhold Krinninger
I went through a similiar issue and followed all of the FAQ guides also. From what i read this is usually caused by permissions so if you have triple checked all perms and are certain they are correct the other isssue can be caused by high load on the nagios server.
If your server has a load over 10 viewed through top you may need to look at why, but in the meantime you can change the thresholds so PNP still runs.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
/[01-11-2013 15:10:37] NPCD: HINT: load_threshold is enabled - ('10.000000')
[02-07-2013 11:31:37] NPCD: Caught Termination Signal - Hasta la vista... baby
[02-07-2013 11:35:25] NPCD: npcd Daemon (0.4.14) started with PID=3973
[02-07-2013 11:35:25] NPCD: Please have a look at 'npcd -V' to get license information
[02-07-2013 11:35:25] NPCD: HINT: load_threshold is enabled - ('10.000000')
[03-04-2013 08:55:28] NPCD: Caught Termination Signal - Hasta la vista... baby
[03-04-2013 08:59:10] NPCD: npcd Daemon (0.4.14) started with PID=4118
[03-04-2013 08:59:10] NPCD: Please have a look at 'npcd -V' to get license information
[03-04-2013 08:59:10] NPCD: HINT: load_threshold is enabled - ('10.000000')
[03-22-2013 13:31:50] NPCD: Caught Termination Signal - Hasta la vista... baby
[03-22-2013 13:35:44] NPCD: npcd Daemon (0.4.14) started with PID=4007
[03-22-2013 13:35:44] NPCD: Please have a look at 'npcd -V' to get license information
[03-22-2013 13:35:44] NPCD: HINT: load_threshold is enabled - ('10.000000')
[04-09-2013 17:24:27] NPCD: Caught Termination Signal - Hasta la vista... baby
[04-09-2013 17:28:20] NPCD: npcd Daemon (0.4.14) started with PID=4028
[04-09-2013 17:28:20] NPCD: Please have a look at 'npcd -V' to get license information
[04-09-2013 17:28:20] NPCD: HINT: load_threshold is enabled - ('10.000000')
[05-07-2013 16:08:38] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-07-2013 16:12:29] NPCD: npcd Daemon (0.4.14) started with PID=4023
[05-07-2013 16:12:29] NPCD: Please have a look at 'npcd -V' to get license information
[05-07-2013 16:12:29] NPCD: HINT: load_threshold is enabled - ('10.000000')
[05-13-2013 11:41:29] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-13-2013 11:45:19] NPCD: npcd Daemon (0.4.14) started with PID=4190
[05-13-2013 11:45:19] NPCD: Please have a look at 'npcd -V' to get license information
[05-13-2013 11:45:19] NPCD: HINT: load_threshold is enabled - ('10.000000')
Nagios XI is running on a powerful server (24 Cores, 16 GB RAM, lots of local HD-Space) with only a few checks (<1000). I saw the server never busy. Actual:
top - 13:32:15 up 1 day, 1:47, 1 user, load average: 0.94, 0.77, 0.64
Tasks: 624 total, 2 running, 620 sleeping, 0 stopped, 2 zombie
Cpu(s): 3.4%us, 0.8%sy, 0.0%ni, 95.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16327580k total, 4366096k used, 11961484k free, 246284k buffers
Swap: 4194296k total, 0k used, 4194296k free, 680864k cached
Now i rechecked the file- and directory-permissions as shown in the faqs, User "nagios" is able to change into the directory "/usr/local/nagios/share/perfdata" and can change/write into files in the subdirectories files. I also executed the command "/usr/local/nagiosxi/scripts/reset_config_perms" again.
So far nothing changed.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
seems that my problem is solved. Both subdirectories "xidpe" and "perfdata" in "/usr/local/nagios/var/spool/" seems to be corrupted. The subdirectory "xidpe" contained very much files, but should under normal conditions be empty(?), the subdirectory "perfdata" was always empty, but the ls-command shows an unnormal size of the directory. i deleted both directories and recreated both directories with same rights, user and group. After a restart of nagios now the performance graphs are drawn again.
I would like to thank everybody who has replied to my post.