Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
GMont
Posts: 7
Joined: Tue Aug 30, 2016 4:31 am

Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Post by GMont »

Hi all,
I'm trying to upgrade from Nagios 3.5 to 4.2.0. The environment is this:

- Virtual machine (VMware)
- Red Hat Enterprise Linux 6.7
- SELinux: disabled
- using gearman from Con-sol Labs repositories

After cloning the Nagios 3.5 VM and compiling Nagios 4.2.0 from sources with rpmbuild I upgraded the rpms and tried with a subset
of hosts/services (just a few), apparently it was working fine.
When trying with the full hosts/services configuration (~2600 hosts, ~25000 services), though, I found out that status.cgi uses 100% CPU
and takes 15s to 30s to complete.

status.dat is ~35MB big.

I built a VM with CentOS and only the Nagios 4 packages (no gearman), and disabled active checks completely, so that the VM was 100% idle,
and the only running process was the CGI, I get similar timings:

Active Host / Service Checks: 2629 / 24186

Code: Select all

-sh-4.1$ ls -l /var/log/nagios/status.dat
-rw-rw-r-- 1 nagios nagios 33936069 Aug 30 11:08 /var/log/nagios/status.dat

-sh-4.1$ export REQUEST_METHOD=GET; export QUERY_STRING="host=all"; export REMOTE_USER="nagiosadmin"
-sh-4.1$ for i in 1 2 3 4 5 6; do time /usr/lib64/nagios/cgi/status.cgi > /dev/null; done

real 0m16.534s
user 0m16.428s
sys 0m0.092s

real 0m17.675s
user 0m17.527s
sys 0m0.116s
                                                                                                                                                                                                                                                                                           
real 0m25.333s                                                                                                                                                                                                                                                                             
user 0m25.123s                                                                                                                                                                                                                                                                             
sys 0m0.109s                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                           
real 0m21.453s                                                                                                                                                                                                                                                                             
user 0m21.333s                                                                                                                                                                                                                                                                             
sys 0m0.099s                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                           
real 0m17.910s                                                                                                                                                                                                                                                                             
user 0m17.812s                                                                                                                                                                                                                                                                             
sys 0m0.081s                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                           
real 0m16.212s                                                                                                                                                                                                                                                                             
user 0m16.115s                                                                                                                                                                                                                                                                             
sys 0m0.082s                                                                                                                                                                                                                                                                               


After moving status.dat to a filesystem in RAM, timings did not change either, so I'm assuming this is not an I/O issue:

Code: Select all

                                                                                                                                                                                                                                                                                     
-sh-4.1$ for i in 1 2 3 4 5; do time /usr/lib64/nagios/cgi/status.cgi > /dev/null; done                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                           
real 0m15.839s                                                                                                                                                                                                                                                                             
user 0m15.761s                                                                                                                                                                                                                                                                             
sys 0m0.065s                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                           
real 0m17.229s                                                                                                                                                                                                                                                                             
user 0m17.147s                                                                                                                                                                                                                                                                             
sys 0m0.065s                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                           
real 0m18.395s
user 0m18.271s
sys 0m0.099s

real 0m28.587s
user 0m28.249s
sys 0m0.089s

real 0m16.609s
user 0m16.520s
sys 0m0.077s

These are the timings for the current Nagios 3.5 installation (same configuration, VM of Nagios 4 is a clone of
the original VM):

Code: Select all

 for i in 1 2 3 4 5; do time /usr/lib64/nagios/cgi-bin/status.cgi > /dev/null; done

real    0m1.517s
user    0m1.472s
sys     0m0.046s

real    0m1.543s
user    0m1.493s
sys     0m0.050s

real    0m1.534s
user    0m1.486s
sys     0m0.049s

real    0m1.594s
user    0m1.523s
sys     0m0.071s

real    0m1.564s
user    0m1.508s
sys     0m0.055s
Is there a way I can profile status.cgi execution to understand why it's so slow? I tried with valgrind/callgrind
and it seems to spend most of the time (62%) in __strcmp_sse42.


Thanks.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Post by Box293 »

There is a known issue with 4.2.0 and the verify taking a long time, this could be related.

The maint branch on GitHub has a fix for it:

https://github.com/NagiosEnterprises/na ... tree/maint

4.2.1 is due to be released early September.

Are you seeing any errors in /var/log/httpd/*_log ?


Alternatively you could try the previous version which does not have the issue like you are reporting:

https://github.com/NagiosEnterprises/na ... gios-4.1.1
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
GMont
Posts: 7
Joined: Tue Aug 30, 2016 4:31 am

Re: Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Post by GMont »

Thanks for you reply.

I took the time to test both with Nagios 4.1.1 and with the maint branch from github, in both cases the loading time of status.cgi was 2s or below,
so I thinks this solves the problem. I will probably wait for 4.2.1 to come out before upgrading the production environment.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Post by rkennedy »

Glad to hear this worked out! Are we good to mark this as resolved?
Former Nagios Employee
GMont
Posts: 7
Joined: Tue Aug 30, 2016 4:31 am

Re: Nagios 4.2.0 status.cgi is really slow and uses 100% CPU

Post by GMont »

Hi,
I tried Nagios 4.2.1, status.cgi loads in less than 2s.
I think we can consider this problem as fixed.

Thanks.
Locked