Random performance data missing
Posted: Wed Dec 26, 2018 4:39 pm
Hello,
Every few days we have gaps in our performance data, this will usually resolve itself after 1-3 days but has never gone longer then 5-6 days without issue for the past year. During these gap timelines I noticed the nagios process consumes nearly all available memory on CentOS
I even added more virtual memory during one of these periods and it slowly consumed a few more extra gigs equaling around 7GB of 8GB available which leads me to believe this may be some sort of memory leak? Here are some things I have tried so far
1. Changed the timeout value in /usr/local/nagios/etc/npcd.cfg, current value at 35 but have tried 15 and 20
2. Modified threshold value in /usr/local/nagios/etc/pnp/process_perfdata.cfg to 80.0%
"thought maybe if I set the higher threshold perfdata would still process during high memory usage"
Here is the error in the nagios.log when perfdata stops processing data
------------------------
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.host" - errno: Cannot allocate memory
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.service" - errno: Cannot allocate memory
-----------------------
Nagios core version 4.2.4
"We rely heavily on mod_gearman so could not update nagios core"
Another related error is
"could not write to destination directory /usr/local/nagios/var/spool/xidpe"
During the time this error was filling the logs I could not see any spool files
Sorry for posting so much information, hopefully it is helpful.
Thank you
Every few days we have gaps in our performance data, this will usually resolve itself after 1-3 days but has never gone longer then 5-6 days without issue for the past year. During these gap timelines I noticed the nagios process consumes nearly all available memory on CentOS
I even added more virtual memory during one of these periods and it slowly consumed a few more extra gigs equaling around 7GB of 8GB available which leads me to believe this may be some sort of memory leak? Here are some things I have tried so far
1. Changed the timeout value in /usr/local/nagios/etc/npcd.cfg, current value at 35 but have tried 15 and 20
2. Modified threshold value in /usr/local/nagios/etc/pnp/process_perfdata.cfg to 80.0%
"thought maybe if I set the higher threshold perfdata would still process during high memory usage"
Here is the error in the nagios.log when perfdata stops processing data
------------------------
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.host" - errno: Cannot allocate memory
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.service" - errno: Cannot allocate memory
-----------------------
Nagios core version 4.2.4
"We rely heavily on mod_gearman so could not update nagios core"
Another related error is
"could not write to destination directory /usr/local/nagios/var/spool/xidpe"
During the time this error was filling the logs I could not see any spool files
Sorry for posting so much information, hopefully it is helpful.
Thank you