I'm trying to upgrade from Nagios 3.5 to 4.2.0. The environment is this:
- Virtual machine (VMware)
- Red Hat Enterprise Linux 6.7
- SELinux: disabled
- using gearman from Con-sol Labs repositories
After cloning the Nagios 3.5 VM and compiling Nagios 4.2.0 from sources with rpmbuild I upgraded the rpms and tried with a subset
of hosts/services (just a few), apparently it was working fine.
When trying with the full hosts/services configuration (~2600 hosts, ~25000 services), though, I found out that status.cgi uses 100% CPU
and takes 15s to 30s to complete.
status.dat is ~35MB big.
I built a VM with CentOS and only the Nagios 4 packages (no gearman), and disabled active checks completely, so that the VM was 100% idle,
and the only running process was the CGI, I get similar timings:
Active Host / Service Checks: 2629 / 24186
Code: Select all
-sh-4.1$ ls -l /var/log/nagios/status.dat
-rw-rw-r-- 1 nagios nagios 33936069 Aug 30 11:08 /var/log/nagios/status.dat
-sh-4.1$ export REQUEST_METHOD=GET; export QUERY_STRING="host=all"; export REMOTE_USER="nagiosadmin"
-sh-4.1$ for i in 1 2 3 4 5 6; do time /usr/lib64/nagios/cgi/status.cgi > /dev/null; done
real 0m16.534s
user 0m16.428s
sys 0m0.092s
real 0m17.675s
user 0m17.527s
sys 0m0.116s
real 0m25.333s
user 0m25.123s
sys 0m0.109s
real 0m21.453s
user 0m21.333s
sys 0m0.099s
real 0m17.910s
user 0m17.812s
sys 0m0.081s
real 0m16.212s
user 0m16.115s
sys 0m0.082s
After moving status.dat to a filesystem in RAM, timings did not change either, so I'm assuming this is not an I/O issue:
Code: Select all
-sh-4.1$ for i in 1 2 3 4 5; do time /usr/lib64/nagios/cgi/status.cgi > /dev/null; done
real 0m15.839s
user 0m15.761s
sys 0m0.065s
real 0m17.229s
user 0m17.147s
sys 0m0.065s
real 0m18.395s
user 0m18.271s
sys 0m0.099s
real 0m28.587s
user 0m28.249s
sys 0m0.089s
real 0m16.609s
user 0m16.520s
sys 0m0.077s
These are the timings for the current Nagios 3.5 installation (same configuration, VM of Nagios 4 is a clone of
the original VM):
Code: Select all
for i in 1 2 3 4 5; do time /usr/lib64/nagios/cgi-bin/status.cgi > /dev/null; done
real 0m1.517s
user 0m1.472s
sys 0m0.046s
real 0m1.543s
user 0m1.493s
sys 0m0.050s
real 0m1.534s
user 0m1.486s
sys 0m0.049s
real 0m1.594s
user 0m1.523s
sys 0m0.071s
real 0m1.564s
user 0m1.508s
sys 0m0.055s
and it seems to spend most of the time (62%) in __strcmp_sse42.
Thanks.