Page 1 of 2

High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 8:54 am
by evisus
Regards!

There has been a problem declared in the Nagios XI forums since 2014 related to performance problems when sla and availability reports are generated: The avail.cgi process consumes over 99% of cpu, leaving the server disabled until the generation of reports ends .

The truth is not very efficient ....

I have reviewed a great variety of installations of nagios xi, servers with little hardware and others with monstrous hardware, in addition to following the recommendations of each of the cases generated by this problem.

However, there is no solution to a known problem and quite a while ago it was "referred to the development area".

My question is this:
Is there a known resolution for this problem?
If there is no solution, can we expect to have this problem solved?

The concern is not only mine, but each of the clients that present this error and more and more frequent.

In one of our clients that presents this problem, we plan to separate the Mysql database on a dedicated server, but we are not sure that this is the solution. Do you have any experience?

I appreciate your response and understanding to our concern.

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 10:39 am
by mcapra
Relevant GitHub issue:
https://github.com/NagiosEnterprises/na ... issues/280
evisus wrote:In one of our clients that presents this problem, we plan to separate the Mysql database on a dedicated server, but we are not sure that this is the solution. Do you have any experience?
avail.cgi has no concept of a MySQL database (it parses raw log files), so I don't think this would offer much. Using off-loaded MySQL is a generally good idea on large installations.

Using a RAM disk can help a little bit, but most of the work is in iterating over lines in log files -- loading those files into memory isn't the bottleneck.

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 11:20 am
by mbellerue
Thanks for jumping in, Matt!

evisus, you're absolutely right. This has been a problem that we've mostly relied on mitigation to solve. However, we are actively working on a new reporting backend that is currently set to be released with Nagios XI 6 that should alleviate many of the reporting issues.

https://www.nagios.com/roadmaps/

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 11:40 am
by evisus
Thanks for your replies Matt and mbellerue.

I will have to communicate this to XI users, although waiting for the third quarter of 2020 will not be received as good news, it is a long time to wait.

In GitHub's comments, reference is made to a possible solution: (https://github.com/NagiosEnterprises/na ... issues/271)

#######
status.cgi is using 100% cpu time on 8 cores in our environment
We localized the problem to the following for-loop in status.c around line 1016

if (is_host_member_of_servicegroup (find_servicegroup (servicegroup_name), temp_host) == TRUE) {
         count_host = 1;
         break;
         }
Our environment is over 600 hosts big, we temporary disabled the host count by commenting the piece of code out of the file. With the lines it takes more then 60 seconds to load the cgi (our back-end server stops after 60 seconds) and without the lines it takes about 1 second.
#######

Is there evidence that this solution works while waiting for the new version of Core or XI nagios?

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 2:54 pm
by mbellerue
I don't know that we have any evidence to the contrary. It could be worth a shot.

The other thing that may work is using the Legacy Reporting system. That may give your users enough visibility to see them through to the next update.

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 3:27 pm
by evisus
I had not heard of "Legacy Reporting system" Could you tell me where I find more information?

Thank you!

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 3:58 pm
by mbellerue
Absolutely. If you click on Reports, and then in the left menu towards the bottom is Legacy Reports. There you will find Availability, Trends, Alert History, etc. Each one of those displays using Nagios Core, so no graph generation or anything like that. The upside is that the load on your system is far lower in comparison. Running the reports this way should get your users the data they need, faster

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 4:15 pm
by evisus
Ok, ok, the old Nagios Core reports, I didn't get it like that :)

I will see if I get any improvement by editing the status.c file and publish the results.

Please keep this case open.

Thank you

Re: High use of cpu in avail report generation

Posted: Mon Oct 07, 2019 4:56 pm
by mbellerue
Okay, we'll keep this open and wait to hear back.

Re: High use of cpu in avail report generation

Posted: Mon Oct 21, 2019 10:40 am
by evisus
Hello, nice to say hello.

I have done the tests and did not get successful results, we will have to wait for it to be resolved in an upcoming release. ...

We are still hoping to get a patch in the short term. :D

Thank you!! You can now close this thread