Page 1 of 1

avail.cgi high cpu

Posted: Tue Apr 21, 2020 5:07 pm
by ebuttice
Nagios Version 4.4.5 running in a container, 6G RAM Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz X 2 , checking 4500 services.
Normal system load

top - 15:03:24 up 144 days, 20:45, 0 users, load average: 0.62, 0.59, 0.48
Tasks: 78 total, 1 running, 77 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.7 us, 1.9 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 5940400 total, 1622464 free, 909408 used, 3408528 buff/cache
KiB Swap: 2097148 total, 1705664 free, 391484 used. 4395548 avail Mem

When running availability reports for one month or more avail.cgi shows almost %100 CPU
Other reports for 7 days takem 1 second to complete.

Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.3 us, 0.2 sy, 0.0 ni, 72.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 5940400 total, 1584688 free, 946856 used, 3408856 buff/cache
KiB Swap: 2097148 total, 1705664 free, 391484 used. 4358180 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18497 nagios 20 0 140436 53984 9228 R 100.0 0.9 0:19.64 avail.cgi
21130 nagios 20 0 35344 1052 828 S 8.3 0.0 0:23.51 nagios

I then get a timeout and the avail.cgi still continues to run and needs to be killed manually.

Anyone know if this is a bug or a setting issue ?

Re: avail.cgi high cpu

Posted: Thu Apr 23, 2020 4:08 pm
by benjaminsmith
Hi @ebuttice,

Thanks for posting to the Nagios Community Forum. It's normal for the CPU to spike for a short period when running the availability report over a longer period of time and the top command looks ok. The report is generated by parsing the nagios.log files and this requires significant CPU resources to complete. I would check the Apache and system logs to make sure you don't have any other issues slow things down.

Re: avail.cgi high cpu

Posted: Thu Apr 23, 2020 6:39 pm
by ebuttice
Thanks for you input.

I migrated from a physical server to a docker container (one month), physical server is still running and a monthly report on that server is almost instantaneous. Running RHEL 5.3 Nagios 3.3.1 on the old server, 3GB Ram and 1x Xeon(R) CPU E5-2680 v4 @ 2.40GHz

On this new server same report timeouts with twice the Ram and CPU. There are two variables here, one is the Nagios version and the other is the container. Also container is running Ubuntu 16 but I doubt the O/s would make such a difference

My report is Availability selecting all hostgroups and this year as the reporting period.
I tried March 13 to April 23 and it worked but running from March 1st to April 23 , it times out again . March 11 works March 10 or older it times out ..

So going back past March 10 produces the error. Maybe something in the logs is trigger it.
I'll dig deeper.

Re: avail.cgi high cpu

Posted: Fri Apr 24, 2020 11:48 am
by ebuttice
I decided to turn my attention to the logs as a source of the problem. I archived/cleared the logs on the new server (archives folder and nagios.log) and replaced a copy of the archive/logs from the old Nagios server. I then reran the reports and this fixed the problem. Reports for Last year and Last month all generated within a second. I restarted the container and retested, still works. I'm fine with replacing the logs as this server is replacing the old server and it would be good to be able to reference past data.

I don't know what specifically in the logs caused the issue, maybe a corrupt log file.. I'll use the system for a while and report back if anything else turns up.

Thanks for reading, thanks for any comments/suggestions.

Re: avail.cgi high cpu

Posted: Fri Apr 24, 2020 2:32 pm
by benjaminsmith
Hi @ebuttice,
'
I don't know what specifically in the logs caused the issue, maybe a corrupt log file.. I'll use the system for a while and report back if anything else turns up.
Hmmm. Not entirely sure, but it sounds like you got it working. Yeah, let us know if you find out anything.

Benjamin