Random performance data missing

fortyx40 · Post by **fortyx40** » Wed Dec 26, 2018 4:39 pm

Hello,

Every few days we have gaps in our performance data, this will usually resolve itself after 1-3 days but has never gone longer then 5-6 days without issue for the past year. During these gap timelines I noticed the nagios process consumes nearly all available memory on CentOS
I even added more virtual memory during one of these periods and it slowly consumed a few more extra gigs equaling around 7GB of 8GB available which leads me to believe this may be some sort of memory leak? Here are some things I have tried so far

1. Changed the timeout value in /usr/local/nagios/etc/npcd.cfg, current value at 35 but have tried 15 and 20

2. Modified threshold value in /usr/local/nagios/etc/pnp/process_perfdata.cfg to 80.0%
"thought maybe if I set the higher threshold perfdata would still process during high memory usage"

Here is the error in the nagios.log when perfdata stops processing data
------------------------
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.host" - errno: Cannot allocate memory

Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.service" - errno: Cannot allocate memory
-----------------------

Nagios core version 4.2.4
"We rely heavily on mod_gearman so could not update nagios core"

Another related error is
"could not write to destination directory /usr/local/nagios/var/spool/xidpe"
During the time this error was filling the logs I could not see any spool files

Sorry for posting so much information, hopefully it is helpful.

Thank you

npolovenko · Post by **npolovenko** » Wed Dec 26, 2018 5:53 pm

@fortyx40, Have you run the "top" command to see which processes are consuming the memory? How many host and service checks are you monitoring? How much memory does your server have in total?
Decreasing the timeout value is not a good idea because if NPCD delay exceeds the timeout value it will skip a portion of the spooled perdata and not process it.
Load threshold needs to be increased in the /usr/local/nagios/etc/pnp/npcd.cfg file.
Timeout needs to be increased in the /usr/local/nagios/etc/pnp/process_perfdata.cfg.
npcd service needs to be restarted in order for changes to get applied:

service npcd restart

Also, please run:

chmod u+w /usr/local/nagios/var/spool/xidpe/

fortyx40 · Post by **fortyx40** » Thu Dec 27, 2018 9:19 am

I did run the top command to come to the conclusion nagios was using all the memory, I increased the load timeout and the threshold not decrease.
I also gave full permissions to /usr/local/nagios/var/spool/xidpe/ though this only happens when nagios starts eating all the memory. I restart the npcd service after every change, basically I believe it's nagios process itself causing all the issues. We have around 1,100 HOST and 2,500 services
Everything runs smooth until nagios decides to consume all the resources.
I attached an example of our perfdata

Thank you!

npolovenko · Post by **npolovenko** » Thu Dec 27, 2018 12:35 pm

@fortyx40, Your problem could be related to this reported issue:
https://github.com/NagiosEnterprises/na ... issues/455

The good news is that the fix that would allow mod gearman integration with the latest Core is already in the QA stage. Once we test it out it will become available for users.

fortyx40 · Post by **fortyx40** » Thu Dec 27, 2018 2:28 pm

That's great news! Will this fix be integrated in a official release of nagios core or a separate entity or patch?

Thank you

npolovenko · Post by **npolovenko** » Thu Dec 27, 2018 2:53 pm

@fortyx40, We'll publish a separate script for mod gearman and updated mod gearman packages to our repo. No changes required on the Core side.

fortyx40 · Post by **fortyx40** » Mon Jan 07, 2019 10:03 am

Is there anyway to sign up for notifications on this scripts release? Will it be available in https://github.com/NagiosEnterprises/nagioscore master branch once it's released?

Thank you

scottwilkerson · Post by **scottwilkerson** » Mon Jan 07, 2019 3:23 pm

If you click the star at the top of the project page you will be notified when releases are made

fortyx40 · Post by **fortyx40** » Fri Feb 22, 2019 4:19 pm

I'm having troubles finding this script, I have looked all through the repo to no avail. Has it been published? I see that new core has been released, do I still need this separate script or is modgearman compatibility just added natively to the new core?

Thank you

fortyx40 · Post by **fortyx40** » Fri Feb 22, 2019 4:27 pm

I believe I found the correct article https://support.nagios.com/kb/article/n ... e-839.html
Looks like we need to just update to mod_gearman3 to use the newer core versions

Nagios Support Forum

Random performance data missing

Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing

Re: Random performance data missing