Random performance data missing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Random performance data missing

Post by fortyx40 »

Hello,

Every few days we have gaps in our performance data, this will usually resolve itself after 1-3 days but has never gone longer then 5-6 days without issue for the past year. During these gap timelines I noticed the nagios process consumes nearly all available memory on CentOS
I even added more virtual memory during one of these periods and it slowly consumed a few more extra gigs equaling around 7GB of 8GB available which leads me to believe this may be some sort of memory leak? Here are some things I have tried so far

1. Changed the timeout value in /usr/local/nagios/etc/npcd.cfg, current value at 35 but have tried 15 and 20

2. Modified threshold value in /usr/local/nagios/etc/pnp/process_perfdata.cfg to 80.0%
"thought maybe if I set the higher threshold perfdata would still process during high memory usage"

Here is the error in the nagios.log when perfdata stops processing data
------------------------
Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.host" - errno: Cannot allocate memory

Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1545804007.perfdata.service" - errno: Cannot allocate memory
-----------------------

Nagios core version 4.2.4
"We rely heavily on mod_gearman so could not update nagios core"

Another related error is
"could not write to destination directory /usr/local/nagios/var/spool/xidpe"
During the time this error was filling the logs I could not see any spool files

Sorry for posting so much information, hopefully it is helpful.

Thank you
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Random performance data missing

Post by npolovenko »

@fortyx40, Have you run the "top" command to see which processes are consuming the memory? How many host and service checks are you monitoring? How much memory does your server have in total?
Decreasing the timeout value is not a good idea because if NPCD delay exceeds the timeout value it will skip a portion of the spooled perdata and not process it.
Load threshold needs to be increased in the /usr/local/nagios/etc/pnp/npcd.cfg file.
Timeout needs to be increased in the /usr/local/nagios/etc/pnp/process_perfdata.cfg.
npcd service needs to be restarted in order for changes to get applied:
service npcd restart
Also, please run:
chmod u+w /usr/local/nagios/var/spool/xidpe/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Re: Random performance data missing

Post by fortyx40 »

I did run the top command to come to the conclusion nagios was using all the memory, I increased the load timeout and the threshold not decrease.
I also gave full permissions to /usr/local/nagios/var/spool/xidpe/ though this only happens when nagios starts eating all the memory. I restart the npcd service after every change, basically I believe it's nagios process itself causing all the issues. We have around 1,100 HOST and 2,500 services
Everything runs smooth until nagios decides to consume all the resources.
I attached an example of our perfdata


Thank you!
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Random performance data missing

Post by npolovenko »

@fortyx40, Your problem could be related to this reported issue:
https://github.com/NagiosEnterprises/na ... issues/455

The good news is that the fix that would allow mod gearman integration with the latest Core is already in the QA stage. Once we test it out it will become available for users.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Re: Random performance data missing

Post by fortyx40 »

That's great news! Will this fix be integrated in a official release of nagios core or a separate entity or patch?

Thank you
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Random performance data missing

Post by npolovenko »

@fortyx40, We'll publish a separate script for mod gearman and updated mod gearman packages to our repo. No changes required on the Core side.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Re: Random performance data missing

Post by fortyx40 »

Is there anyway to sign up for notifications on this scripts release? Will it be available in https://github.com/NagiosEnterprises/nagioscore master branch once it's released?


Thank you
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Random performance data missing

Post by scottwilkerson »

If you click the star at the top of the project page you will be notified when releases are made
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Re: Random performance data missing

Post by fortyx40 »

I'm having troubles finding this script, I have looked all through the repo to no avail. Has it been published? I see that new core has been released, do I still need this separate script or is modgearman compatibility just added natively to the new core?

Thank you
fortyx40
Posts: 6
Joined: Fri Dec 07, 2018 4:08 pm

Re: Random performance data missing

Post by fortyx40 »

I believe I found the correct article https://support.nagios.com/kb/article/n ... e-839.html
Looks like we need to just update to mod_gearman3 to use the newer core versions
Locked