NagiosXI server performance degradation due to avail.cgi

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
msmulpuri
Posts: 27
Joined: Thu Sep 22, 2016 7:40 am

NagiosXI server performance degradation due to avail.cgi

Post by msmulpuri »

Wondering if there is any remedy for the runaway script avail.cgi which had 5 instances of this script running and causing perfromance degradation on the customer's primary NagiosXI server. Is this a known issue or any information that can be used to remedy this issue? Please advise. I am guessing someone must have tried to pull a huge report and canceled or interrupted it. This has been like this for almost a week now. We informed the customer to see if the process (script) can be killed but customer wanted this to be escalated to your team as this could be a code related issue from NagiosXI perspective. Thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagiosXI server performance degradation due to avail.cgi

Post by scottwilkerson »

To escalate to our team please open a ticket at
https://support.nagios.com/tickets/
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
msmulpuri
Posts: 27
Joined: Thu Sep 22, 2016 7:40 am

Re: NagiosXI server performance degradation due to avail.cgi

Post by msmulpuri »

Are there any suggestions if we could just kill the zombie avail.cgi script (5 instances running) and would resolve the issue? This is also causing intermittent NagiosXI throws repair database using /usr/local/nagiosxi/scripts/repair_databases.sh

Any suggestions where/how to resolve this issue would be much appreciated. Current version of NagiosXI is 5.4.11

Becasue of the 2 AM scheduled backup is not working as well.

I am sure someone in the forum must have had the same issue.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagiosXI server performance degradation due to avail.cgi

Post by scottwilkerson »

msmulpuri wrote:Are there any suggestions if we could just kill the zombie avail.cgi script (5 instances running) and would resolve the issue? This is also causing intermittent NagiosXI throws repair database using /usr/local/nagiosxi/scripts/repair_databases.sh

Any suggestions where/how to resolve this issue would be much appreciated. Current version of NagiosXI is 5.4.11

Becasue of the 2 AM scheduled backup is not working as well.

I am sure someone in the forum must have had the same issue.
you could certainly do that however it will impact the reports it is running for.

My guess is that there are multiple scheduled report (availability or SLA) kicking off at the same time which utilize the avail.cgi
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
msmulpuri
Posts: 27
Joined: Thu Sep 22, 2016 7:40 am

Re: NagiosXI server performance degradation due to avail.cgi

Post by msmulpuri »

That is what we ended up doing and the server is returned to its normal state. I don't think it was any scheduled report caused that issue but rather someone tried to run a hughe report and aborted/canceled the report which could have caused multiple avail.cgi scripts triggered and ended up in zombie state. Is there any update to address this issue in future release or in the latest release? This thread can be closed.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagiosXI server performance degradation due to avail.cgi

Post by scottwilkerson »

msmulpuri wrote:That is what we ended up doing and the server is returned to its normal state. I don't think it was any scheduled report caused that issue but rather someone tried to run a hughe report and aborted/canceled the report which could have caused multiple avail.cgi scripts triggered and ended up in zombie state. Is there any update to address this issue in future release or in the latest release? This thread can be closed.
We are looking into trying to remove the usage of avail.cgi by XI 6 because of this bottleneck, but that is a distant future.

Locking
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked