Nagios process at 100% CPU, system is crawling

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

Yes. It varies from 70% to 100%. I have attached a top example. Also the web UI is unresponsive (see attached screen shot as well). It eventually times out to Bad Gateway - it takes about 10-15 minutes, but it eventually gets there.
TOP
TOP
UI
UI
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios process at 100% CPU, system is crawling

Post by scottwilkerson »

Is there a reason nagios isn't running in daemon mode? Usually you would see the following with -d

Code: Select all

nagios   24190     1  1 10:33 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   24201 24190  0 10:33 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

I'm not sure. I followed these instructions for setting up Nagios.

https://www.digitalocean.com/community/ ... untu-16-04

I presume this is where that would be controlled:

sudo systemctl enable /etc/systemd/system/nagios.service
sudo systemctl start nagios

Should we do it differently?

Here is a service status:

● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-07-26 16:51:58 EDT; 20h ago
Main PID: 940 (nagios)
Tasks: 8
Memory: 9.7G
CPU: 9h 57min 22.607s
CGroup: /system.slice/nagios.service
├─ 940 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─1067 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1068 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1069 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1070 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1071 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1072 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1747 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─9205 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ****** -a ********
├─9210 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ***********
├─9211 /usr/local/nagios/libexec/check_dns -H ******************* -s *.*.*.*
└─9212 /usr/bin/nslookup -sil ***************** *.*.*.*
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios process at 100% CPU, system is crawling

Post by scottwilkerson »

edit /etc/systemd/system/nagios.service
change this

Code: Select all

ExecStart=/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
to this

Code: Select all

ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
then

Code: Select all

systemctl daemon-reload
systemctl restart nagios.service
Also, how many hosts and service are on this machine with what type of check interval?

And I didn't see the response to how many CPU's the server has.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

So I did that and now it doesn't start after reboot...

Code: Select all

sysadmin@monitoring:~$ sudo service nagios status
[sudo] password for sysadmin:
● nagios.service - Nagios
   Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2018-07-27 17:46:13 EDT; 1min 54s ago
  Process: 957 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=254)
 Main PID: 957 (code=exited, status=254)

Jul 27 17:46:13 monitoring systemd[1]: Started Nagios.
Jul 27 17:46:13 monitoring nagios[957]: Failed to obtain lock on file /run/nagios.lock: Permission denied
Jul 27 17:46:13 monitoring nagios[957]: Bailing out due to errors encountered while attempting to daemonize... (PID=957)
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Main process exited, code=exited, status=254/n/a
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Unit entered failed state.
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Failed with result 'exit-code'.
Here is a listing of the perms on that directory:

Code: Select all

sysadmin@monitoring:~$ ls /usr/local/nagios/bin/ -la
total 776
drwxrwxr-x 2 nagios nagios   4096 Jul 26 16:50 .
drwxr-xr-x 9 nagios nagios   4096 Jul 11 12:10 ..
-rwxrwxr-- 1 nagios nagios 729560 Jul 26 16:50 nagios
-rwxrwxr-- 1 nagios nagios  39648 Jul 26 16:50 nagiostats
-rwxr-xr-x 1 nagios nagios  10675 Jul 11 12:06 nrpe-uninstall
The /run directory is owned by root:root - not sure why it can't create a lock file.

I then tried this:

Code: Select all

sysadmin@monitoring:~$ ps -ef|grep nagios.cfg
sysadmin  1823  1804  0 17:48 pts/0    00:00:00 grep --color=auto nagios.cfg

Here are the stats:

Code: Select all

Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-06-25
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 549 services.
        Checked 98 hosts.
        Checked 29 host groups.
        Checked 3 service groups.
        Checked 1 contacts.
        Checked 1 contact groups.
        Checked 28 commands.
        Checked 7 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 98 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 7 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
As for the hardware, I believe it's 4 cores. Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios process at 100% CPU, system is crawling

Post by scottwilkerson »

lets edit your nagios.cfg

change this

Code: Select all

lock_file=/var/run/nagios.lock
to this

Code: Select all

lock_file=/usr/local/nagios/var/nagios.lock
Then try the restart
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

Thanks. That helped get Nagios running. I have attached screen shots of where we are now including performance stats. Response times are still pretty slow on the UI. I don't see nagios in the top results anymore. Looking at top, the CPU is mostly idle and swap memory isn't being used until you try to load a page on the UI. I had top running when I clicked on the Nagios Service Problems link on the left. It took 1m 15s to load the page. I attached a video in real time of the top running so you can see what is happening from that standpoint. Any ideas?
Historical Load
Historical Load
Load last 24 hours
Load last 24 hours
RAM historical
RAM historical
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

I couldn't upload a 4th file in the last post. Here is the video.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios process at 100% CPU, system is crawling

Post by scottwilkerson »

Video didn't upload.

Are you seeing any errors in the httpd error_log?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
consulvation
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Post by consulvation »

It's an MP4 file which is not allowed on this board. Not sure which format you would like.

From /var/log/apache2/error.log:

Code: Select all

cat error.log
[Mon Jul 30 06:25:01.930424 2018] [mpm_prefork:notice] [pid 1212] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Mon Jul 30 06:25:01.930448 2018] [core:notice] [pid 1212] AH00094: Command line: '/usr/sbin/apache2'
Looking at the access.log file, it's all 200 OK responses, nothing unusual.
Locked