Availability Report slow

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Availability Report slow

Post by mejokj »

This is happening on the same server mentioned here: http://support.nagios.com/forum/viewtop ... 16&t=32288

The availability reports generation is very slow and some times it doesn't get generated at all. I think the browser request eventually times out and we have to restart the browser to generate the report again. I investigated a bit and this seems to be happening during the generation of pdf using wkhtmltopdf. I just did a 'ps aux | grep pdf' on the server and I get the following:


apache 300 0.0 0.1 104024 27416 ? Sl Apr08 0:00 /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right Page [page] of [toPage] --footer-left 2015-04-08 09:47:10 http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US /usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1428472030-page.pdf

I have 27 processes similar to the above on the server right now but couldn't post all of them here due to the following error: "Your message contains too many URLs. The maximum number of URLs allowed is 10."

It looks like the whkhtmltopdf is executed, but for some reason it is getting stuck and it just stays running forever. This is not happening all the time, but frequently enough to be annoying.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Availability Report slow

Post by scottwilkerson »

Depending on your system resources, 27 simultaneous calls to the availability report will likely bring your server to it's knees, specifically if you do not have a ton of RAM and super fast disks.

Do you have a ton of scheduled reports kicking off at the exact same time? Maybe you could spread them out my a few minutes a piece.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Availability Report slow

Post by mejokj »

I think you got me wrong there - We are only generating one report at a time. Some times that report is not generated and the browser crashes or times out waiting for it. Then we have to go back and try again.

What I have noticed is when the report is not generated the process still stays running forever. That is how we got 27 processes running in the background - these are from several runs during the last few weeks. I know I can kill them but I would like to get a permanent solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Availability Report slow

Post by WillemDH »

Mejokj,

There is definitely something wrong with the availibility report generation for longer periods. As I also described here: http://support.nagios.com/forum/viewtop ... it=+report
It just throttles CPU and makes the Nagios XI gui unresponsive.
Reporting is quite impotant for our 'management', so it would be nice if a Nagios developer could get some time to have a good look at the whole reporting generation method and hopefully make it more stable and useful.
Quarterly reports are really important:
http://tracker.nagios.com/view.php?id=700
And Nagios response has been in the past to schedule reports during night or, but as we need to run a report first before we can schedule it, and reporting settings regularely need to be changed, this solution could only work if we wouldn't be forced to run the report first before we can schedule it. THis certainly applies to long running reports..
http://tracker.nagios.com/view.php?id=648

As said in the past personally I have no issues with a report running long. But this should not throttle the cpu of our Nagios XI production server.

Please +1 my feature requests if you agree with me :)

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Availability Report slow

Post by abrist »

I agree on all counts. BTW - I sent you a PM :)
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Availability Report slow

Post by mejokj »

I am still facing this issue. The reports are being generated properly but when I click on the 'pdf' icon to download the report in PDF format it just hangs.

I see the following processes on the server that is using wkhtmltopdf:

[root@NMSAPPSERVER1 ~]# ps aux | grep pdf
root 2146 0.0 0.0 4356 740 pts/0 S+ 21:05 0:00 grep pdf
apache 8457 0.0 0.0 5316 1196 ? S 21:03 0:00 sh -c /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right "Page [page] of [toPage]" --footer-left "2015-05-19 21:03:06" 'http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US' '/usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1432054986-page.pdf' 2>&1


apache 8458 0.2 0.1 100696 24120 ? Sl 21:03 0:00 /usr/bin/wkhtmltopdf --no-outline --footer-spacing 3 --margin-bottom 15mm --footer-font-size 9 --footer-right Page [page] of [toPage] --footer-left 2015-05-19 21:03:06 http://10.10.10.9/nagiosxi//reports/ava ... cale=en_US /usr/local/nagiosxi/tmp/scheduledreport-nagiosadmin-1432054986-page.pdf
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Availability Report slow

Post by abrist »

We have talked about this internally and there have been some decent suggestions for changing the current method of report generating. The current issue is that the cgi call to gather the availability data is called through apache, hanging the service until the thread is killed or the report is generated. It would be better to separate the cgi call from apache, through a cli QUERY most likely. It would need to have hooks in the backend, etc. There are many methods to fix this issue though, including the one above - but they all require rewriting the reporting engine.

We do not have a resolution yet, but we are very aware of this.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Availability Report slow

Post by mrochelle »

Any update on this issue? We are experiencing the same problem and at this point, most all reports fail to return data and run indefinitely. However, I believe this is unique in our circumstance due to the size of the nagios server with Active Host / Service Checks: 2351 / 18006. We have a number of smaller nagios servers with no reporting problems at all so I suspect we have exceeded some parameter that the existing reporting code is not able to handle. We are able to pull reports through the backend without a problem using Thruk but the output is not as well formatted and graphic as native nagios XI.
Thanks, Marcus
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Availability Report slow

Post by tmcdonald »

So a few things:
  • I do believe your issue has to do with the number of objects being monitored, as 20k checks is usually the most I recommend for a single XI server for just this reason. Especially if your checks are running on average more than once every 5 minutes, you are going to see a lot of historical data that the reports need to dig through.
  • You can somewhat mitigate the timeouts by increasing some PHP limits:

    Code: Select all

    max_execution_time = 60     ; Maximum execution time of each script, in  seconds
    max_input_time = 60     ; Maximum amount of time each script may spend parsing request data
    memory_limit = 256M      ; Maximum amount of memory a script may consume 
    but even that will only get you so far, and will not help too much with the speed.
  • I can't say we officially support/recommend running Thruk on an XI machine - it might not break anything, but we have not tested it so we can't guarantee its stability.
As for the original issue itself, we now generate reports using AJAX so the timeouts should not be due to any hanging on the frontend. It still may take a long time, but as long as your PHP settings in /etc/php.ini are high enough it should eventually complete. It's basically now just a "too much data" issue causing the slowness.

We do have a developer here who has a caching component for reports in the works that should help a bit, but it will not be available until 5.3.0. I won't have many more details until then, unfortunately.
Former Nagios employee
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Availability Report slow

Post by mrochelle »

Just a quick update. I split our master nagios XI server into 3 nagios XI servers with approximately 7000 - 9000 monitored objects on each. The report functionality works like a charm in only a few minutes for some of the larger reports.
Marcus - Looking forward to Nagios 2016 World Conf.
Locked