Nagios process at 100% CPU, system is crawling
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
Yes. It varies from 70% to 100%. I have attached a top example. Also the web UI is unresponsive (see attached screen shot as well). It eventually times out to Bad Gateway - it takes about 10-15 minutes, but it eventually gets there.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios process at 100% CPU, system is crawling
Is there a reason nagios isn't running in daemon mode? Usually you would see the following with -d
Code: Select all
nagios 24190 1 1 10:33 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24201 24190 0 10:33 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
I'm not sure. I followed these instructions for setting up Nagios.
https://www.digitalocean.com/community/ ... untu-16-04
I presume this is where that would be controlled:
sudo systemctl enable /etc/systemd/system/nagios.service
sudo systemctl start nagios
Should we do it differently?
Here is a service status:
● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-07-26 16:51:58 EDT; 20h ago
Main PID: 940 (nagios)
Tasks: 8
Memory: 9.7G
CPU: 9h 57min 22.607s
CGroup: /system.slice/nagios.service
├─ 940 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─1067 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1068 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1069 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1070 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1071 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1072 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1747 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─9205 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ****** -a ********
├─9210 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ***********
├─9211 /usr/local/nagios/libexec/check_dns -H ******************* -s *.*.*.*
└─9212 /usr/bin/nslookup -sil ***************** *.*.*.*
https://www.digitalocean.com/community/ ... untu-16-04
I presume this is where that would be controlled:
sudo systemctl enable /etc/systemd/system/nagios.service
sudo systemctl start nagios
Should we do it differently?
Here is a service status:
● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-07-26 16:51:58 EDT; 20h ago
Main PID: 940 (nagios)
Tasks: 8
Memory: 9.7G
CPU: 9h 57min 22.607s
CGroup: /system.slice/nagios.service
├─ 940 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─1067 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1068 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1069 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1070 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1071 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1072 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1747 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─9205 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ****** -a ********
├─9210 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ***********
├─9211 /usr/local/nagios/libexec/check_dns -H ******************* -s *.*.*.*
└─9212 /usr/bin/nslookup -sil ***************** *.*.*.*
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios process at 100% CPU, system is crawling
edit /etc/systemd/system/nagios.service
change this
to this
then
Also, how many hosts and service are on this machine with what type of check interval?
And I didn't see the response to how many CPU's the server has.
change this
Code: Select all
ExecStart=/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
Code: Select all
ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Code: Select all
systemctl daemon-reload
systemctl restart nagios.service
And I didn't see the response to how many CPU's the server has.
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
So I did that and now it doesn't start after reboot...
Here is a listing of the perms on that directory:
The /run directory is owned by root:root - not sure why it can't create a lock file.
I then tried this:
Here are the stats:
As for the hardware, I believe it's 4 cores. Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz.
Code: Select all
sysadmin@monitoring:~$ sudo service nagios status
[sudo] password for sysadmin:
● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2018-07-27 17:46:13 EDT; 1min 54s ago
Process: 957 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=254)
Main PID: 957 (code=exited, status=254)
Jul 27 17:46:13 monitoring systemd[1]: Started Nagios.
Jul 27 17:46:13 monitoring nagios[957]: Failed to obtain lock on file /run/nagios.lock: Permission denied
Jul 27 17:46:13 monitoring nagios[957]: Bailing out due to errors encountered while attempting to daemonize... (PID=957)
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Main process exited, code=exited, status=254/n/a
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Unit entered failed state.
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Failed with result 'exit-code'.
Code: Select all
sysadmin@monitoring:~$ ls /usr/local/nagios/bin/ -la
total 776
drwxrwxr-x 2 nagios nagios 4096 Jul 26 16:50 .
drwxr-xr-x 9 nagios nagios 4096 Jul 11 12:10 ..
-rwxrwxr-- 1 nagios nagios 729560 Jul 26 16:50 nagios
-rwxrwxr-- 1 nagios nagios 39648 Jul 26 16:50 nagiostats
-rwxr-xr-x 1 nagios nagios 10675 Jul 11 12:06 nrpe-uninstall
I then tried this:
Code: Select all
sysadmin@monitoring:~$ ps -ef|grep nagios.cfg
sysadmin 1823 1804 0 17:48 pts/0 00:00:00 grep --color=auto nagios.cfg
Here are the stats:
Code: Select all
Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-06-25
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 549 services.
Checked 98 hosts.
Checked 29 host groups.
Checked 3 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 28 commands.
Checked 7 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 98 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 7 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios process at 100% CPU, system is crawling
lets edit your nagios.cfg
change this
to this
Then try the restart
change this
Code: Select all
lock_file=/var/run/nagios.lock
Code: Select all
lock_file=/usr/local/nagios/var/nagios.lock
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
Thanks. That helped get Nagios running. I have attached screen shots of where we are now including performance stats. Response times are still pretty slow on the UI. I don't see nagios in the top results anymore. Looking at top, the CPU is mostly idle and swap memory isn't being used until you try to load a page on the UI. I had top running when I clicked on the Nagios Service Problems link on the left. It took 1m 15s to load the page. I attached a video in real time of the top running so you can see what is happening from that standpoint. Any ideas?
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
I couldn't upload a 4th file in the last post. Here is the video.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios process at 100% CPU, system is crawling
Video didn't upload.
Are you seeing any errors in the httpd error_log?
Are you seeing any errors in the httpd error_log?
-
- Posts: 16
- Joined: Thu Jul 19, 2018 10:07 pm
Re: Nagios process at 100% CPU, system is crawling
It's an MP4 file which is not allowed on this board. Not sure which format you would like.
From /var/log/apache2/error.log:
Looking at the access.log file, it's all 200 OK responses, nothing unusual.
From /var/log/apache2/error.log:
Code: Select all
cat error.log
[Mon Jul 30 06:25:01.930424 2018] [mpm_prefork:notice] [pid 1212] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Mon Jul 30 06:25:01.930448 2018] [core:notice] [pid 1212] AH00094: Command line: '/usr/sbin/apache2'