Regards!
There has been a problem declared in the Nagios XI forums since 2014 related to performance problems when sla and availability reports are generated: The avail.cgi process consumes over 99% of cpu, leaving the server disabled until the generation of reports ends .
The truth is not very efficient ....
I have reviewed a great variety of installations of nagios xi, servers with little hardware and others with monstrous hardware, in addition to following the recommendations of each of the cases generated by this problem.
However, there is no solution to a known problem and quite a while ago it was "referred to the development area".
My question is this:
Is there a known resolution for this problem?
If there is no solution, can we expect to have this problem solved?
The concern is not only mine, but each of the clients that present this error and more and more frequent.
In one of our clients that presents this problem, we plan to separate the Mysql database on a dedicated server, but we are not sure that this is the solution. Do you have any experience?
I appreciate your response and understanding to our concern.
High use of cpu in avail report generation
Re: High use of cpu in avail report generation
Relevant GitHub issue:
https://github.com/NagiosEnterprises/na ... issues/280
Using a RAM disk can help a little bit, but most of the work is in iterating over lines in log files -- loading those files into memory isn't the bottleneck.
https://github.com/NagiosEnterprises/na ... issues/280
avail.cgi has no concept of a MySQL database (it parses raw log files), so I don't think this would offer much. Using off-loaded MySQL is a generally good idea on large installations.evisus wrote:In one of our clients that presents this problem, we plan to separate the Mysql database on a dedicated server, but we are not sure that this is the solution. Do you have any experience?
Using a RAM disk can help a little bit, but most of the work is in iterating over lines in log files -- loading those files into memory isn't the bottleneck.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: High use of cpu in avail report generation
Thanks for jumping in, Matt!
evisus, you're absolutely right. This has been a problem that we've mostly relied on mitigation to solve. However, we are actively working on a new reporting backend that is currently set to be released with Nagios XI 6 that should alleviate many of the reporting issues.
https://www.nagios.com/roadmaps/
evisus, you're absolutely right. This has been a problem that we've mostly relied on mitigation to solve. However, we are actively working on a new reporting backend that is currently set to be released with Nagios XI 6 that should alleviate many of the reporting issues.
https://www.nagios.com/roadmaps/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High use of cpu in avail report generation
Thanks for your replies Matt and mbellerue.
I will have to communicate this to XI users, although waiting for the third quarter of 2020 will not be received as good news, it is a long time to wait.
In GitHub's comments, reference is made to a possible solution: (https://github.com/NagiosEnterprises/na ... issues/271)
#######
status.cgi is using 100% cpu time on 8 cores in our environment
We localized the problem to the following for-loop in status.c around line 1016
if (is_host_member_of_servicegroup (find_servicegroup (servicegroup_name), temp_host) == TRUE) {
count_host = 1;
break;
}
Our environment is over 600 hosts big, we temporary disabled the host count by commenting the piece of code out of the file. With the lines it takes more then 60 seconds to load the cgi (our back-end server stops after 60 seconds) and without the lines it takes about 1 second.
#######
Is there evidence that this solution works while waiting for the new version of Core or XI nagios?
I will have to communicate this to XI users, although waiting for the third quarter of 2020 will not be received as good news, it is a long time to wait.
In GitHub's comments, reference is made to a possible solution: (https://github.com/NagiosEnterprises/na ... issues/271)
#######
status.cgi is using 100% cpu time on 8 cores in our environment
We localized the problem to the following for-loop in status.c around line 1016
if (is_host_member_of_servicegroup (find_servicegroup (servicegroup_name), temp_host) == TRUE) {
count_host = 1;
break;
}
Our environment is over 600 hosts big, we temporary disabled the host count by commenting the piece of code out of the file. With the lines it takes more then 60 seconds to load the cgi (our back-end server stops after 60 seconds) and without the lines it takes about 1 second.
#######
Is there evidence that this solution works while waiting for the new version of Core or XI nagios?
Re: High use of cpu in avail report generation
I don't know that we have any evidence to the contrary. It could be worth a shot.
The other thing that may work is using the Legacy Reporting system. That may give your users enough visibility to see them through to the next update.
The other thing that may work is using the Legacy Reporting system. That may give your users enough visibility to see them through to the next update.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High use of cpu in avail report generation
I had not heard of "Legacy Reporting system" Could you tell me where I find more information?
Thank you!
Thank you!
Re: High use of cpu in avail report generation
Absolutely. If you click on Reports, and then in the left menu towards the bottom is Legacy Reports. There you will find Availability, Trends, Alert History, etc. Each one of those displays using Nagios Core, so no graph generation or anything like that. The upside is that the load on your system is far lower in comparison. Running the reports this way should get your users the data they need, faster
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High use of cpu in avail report generation
Ok, ok, the old Nagios Core reports, I didn't get it like that 
I will see if I get any improvement by editing the status.c file and publish the results.
Please keep this case open.
Thank you
I will see if I get any improvement by editing the status.c file and publish the results.
Please keep this case open.
Thank you
Re: High use of cpu in avail report generation
Okay, we'll keep this open and wait to hear back.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: High use of cpu in avail report generation
Hello, nice to say hello.
I have done the tests and did not get successful results, we will have to wait for it to be resolved in an upcoming release. ...
We are still hoping to get a patch in the short term.
Thank you!! You can now close this thread
I have done the tests and did not get successful results, we will have to wait for it to be resolved in an upcoming release. ...
We are still hoping to get a patch in the short term.
Thank you!! You can now close this thread