Page 1 of 2
Availability Report slow
Posted: Wed Apr 08, 2015 4:25 pm
by mejokj
This is happening on the same server mentioned here:
http://support.nagios.com/forum/viewtop ... 16&t=32288
The availability reports generation is very slow and some times it doesn't get generated at all. I think the browser request eventually times out and we have to restart the browser to generate the report again. I investigated a bit and this seems to be happening during the generation of pdf using wkhtmltopdf. I just did a 'ps aux | grep pdf' on the server and I get the following:
apache 300 0.0 0.1 104024 27416 ? Sl Apr08 0:00 /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right Page [page] of [toPage] --footer-left 2015-04-08 09:47:10
http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US /usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1428472030-page.pdf
I have 27 processes similar to the above on the server right now but couldn't post all of them here due to the following error: "Your message contains too many URLs. The maximum number of URLs allowed is 10."
It looks like the whkhtmltopdf is executed, but for some reason it is getting stuck and it just stays running forever. This is not happening all the time, but frequently enough to be annoying.
Re: Availability Report slow
Posted: Thu Apr 09, 2015 9:23 am
by scottwilkerson
Depending on your system resources, 27 simultaneous calls to the availability report will likely bring your server to it's knees, specifically if you do not have a ton of RAM and super fast disks.
Do you have a ton of scheduled reports kicking off at the exact same time? Maybe you could spread them out my a few minutes a piece.
Re: Availability Report slow
Posted: Sat Apr 11, 2015 2:48 am
by mejokj
I think you got me wrong there - We are only generating one report at a time. Some times that report is not generated and the browser crashes or times out waiting for it. Then we have to go back and try again.
What I have noticed is when the report is not generated the process still stays running forever. That is how we got 27 processes running in the background - these are from several runs during the last few weeks. I know I can kill them but I would like to get a permanent solution.
Re: Availability Report slow
Posted: Sat Apr 11, 2015 6:45 am
by WillemDH
Mejokj,
There is definitely something wrong with the availibility report generation for longer periods. As I also described here:
http://support.nagios.com/forum/viewtop ... it=+report
It just throttles CPU and makes the Nagios XI gui unresponsive.
Reporting is quite impotant for our 'management', so it would be nice if a Nagios developer could get some time to have a good look at the whole reporting generation method and hopefully make it more stable and useful.
Quarterly reports are really important:
http://tracker.nagios.com/view.php?id=700
And Nagios response has been in the past to schedule reports during night or, but as we need to run a report first before we can schedule it, and reporting settings regularely need to be changed, this solution could only work if we wouldn't be forced to run the report first before we can schedule it. THis certainly applies to long running reports..
http://tracker.nagios.com/view.php?id=648
As said in the past personally I have no issues with a report running long. But this should not throttle the cpu of our Nagios XI production server.
Please +1 my feature requests if you agree with me
Grtz
Willem
Re: Availability Report slow
Posted: Mon Apr 13, 2015 11:35 am
by abrist
I agree on all counts. BTW - I sent you a PM

Re: Availability Report slow
Posted: Tue May 19, 2015 12:14 pm
by mejokj
I am still facing this issue. The reports are being generated properly but when I click on the 'pdf' icon to download the report in PDF format it just hangs.
I see the following processes on the server that is using wkhtmltopdf:
[root@NMSAPPSERVER1 ~]# ps aux | grep pdf
root 2146 0.0 0.0 4356 740 pts/0 S+ 21:05 0:00 grep pdf
apache 8457 0.0 0.0 5316 1196 ? S 21:03 0:00 sh -c /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right "Page [page] of [toPage]" --footer-left "2015-05-19 21:03:06" '
http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US' '/usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1432054986-page.pdf' 2>&1
apache 8458 0.2 0.1 100696 24120 ? Sl 21:03 0:00 /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right Page [page] of [toPage] --footer-left 2015-05-19 21:03:06
http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US /usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1432054986-page.pdf
Re: Availability Report slow
Posted: Tue May 19, 2015 2:09 pm
by abrist
We have talked about this internally and there have been some decent suggestions for changing the current method of report generating. The current issue is that the cgi call to gather the availability data is called through apache, hanging the service until the thread is killed or the report is generated. It would be better to separate the cgi call from apache, through a cli QUERY most likely. It would need to have hooks in the backend, etc. There are many methods to fix this issue though, including the one above - but they all require rewriting the reporting engine.
We do not have a resolution yet, but we are very aware of this.
Re: Availability Report slow
Posted: Mon May 23, 2016 10:41 am
by mrochelle
Any update on this issue? We are experiencing the same problem and at this point, most all reports fail to return data and run indefinitely. However, I believe this is unique in our circumstance due to the size of the nagios server with Active Host / Service Checks: 2351 / 18006. We have a number of smaller nagios servers with no reporting problems at all so I suspect we have exceeded some parameter that the existing reporting code is not able to handle. We are able to pull reports through the backend without a problem using Thruk but the output is not as well formatted and graphic as native nagios XI.
Thanks, Marcus
Re: Availability Report slow
Posted: Mon May 23, 2016 11:17 am
by tmcdonald
So a few things:
- I do believe your issue has to do with the number of objects being monitored, as 20k checks is usually the most I recommend for a single XI server for just this reason. Especially if your checks are running on average more than once every 5 minutes, you are going to see a lot of historical data that the reports need to dig through.
- You can somewhat mitigate the timeouts by increasing some PHP limits:
Code: Select all
max_execution_time = 60 ; Maximum execution time of each script, in seconds
max_input_time = 60 ; Maximum amount of time each script may spend parsing request data
memory_limit = 256M ; Maximum amount of memory a script may consume
but even that will only get you so far, and will not help too much with the speed.
- I can't say we officially support/recommend running Thruk on an XI machine - it might not break anything, but we have not tested it so we can't guarantee its stability.
As for the original issue itself, we now generate reports using AJAX so the timeouts should not be due to any hanging on the frontend. It still may take a long time, but as long as your PHP settings in /etc/php.ini are high enough it should eventually complete. It's basically now just a "too much data" issue causing the slowness.
We do have a developer here who has a caching component for reports in the works that should help a bit, but it will not be available until 5.3.0. I won't have many more details until then, unfortunately.
Re: Availability Report slow
Posted: Fri Jun 17, 2016 12:09 pm
by mrochelle
Just a quick update. I split our master nagios XI server into 3 nagios XI servers with approximately 7000 - 9000 monitored objects on each. The report functionality works like a charm in only a few minutes for some of the larger reports.
Marcus - Looking forward to Nagios 2016 World Conf.