Nagios process at 100% CPU, system is crawling

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Fri Jul 27, 2018 10:09 am

Yes. It varies from 70% to 100%. I have attached a top example. Also the web UI is unresponsive (see attached screen shot as well). It eventually times out to Bad Gateway - it takes about 10-15 minutes, but it eventually gets there.

2018-07-27_11-04-11.png
TOP


2018-07-27_11-06-22.png
UI
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Postby scottwilkerson » Fri Jul 27, 2018 10:33 am

Is there a reason nagios isn't running in daemon mode? Usually you would see the following with -d
Code: Select all
nagios   24190     1  1 10:33 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   24201 24190  0 10:33 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12594
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Fri Jul 27, 2018 12:32 pm

I'm not sure. I followed these instructions for setting up Nagios.

https://www.digitalocean.com/community/ ... untu-16-04

I presume this is where that would be controlled:

sudo systemctl enable /etc/systemd/system/nagios.service
sudo systemctl start nagios

Should we do it differently?

Here is a service status:

● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-07-26 16:51:58 EDT; 20h ago
Main PID: 940 (nagios)
Tasks: 8
Memory: 9.7G
CPU: 9h 57min 22.607s
CGroup: /system.slice/nagios.service
├─ 940 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─1067 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1068 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1069 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1070 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1071 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1072 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─1747 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─9205 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ****** -a ********
├─9210 /usr/local/nagios/libexec/check_nrpe -H *.*.*.* -c ***********
├─9211 /usr/local/nagios/libexec/check_dns -H ******************* -s *.*.*.*
└─9212 /usr/bin/nslookup -sil ***************** *.*.*.*
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Postby scottwilkerson » Fri Jul 27, 2018 2:40 pm

edit /etc/systemd/system/nagios.service
change this
Code: Select all
ExecStart=/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

to this
Code: Select all
ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


then
Code: Select all
systemctl daemon-reload
systemctl restart nagios.service


Also, how many hosts and service are on this machine with what type of check interval?

And I didn't see the response to how many CPU's the server has.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12594
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Fri Jul 27, 2018 5:03 pm

So I did that and now it doesn't start after reboot...

Code: Select all
sysadmin@monitoring:~$ sudo service nagios status
[sudo] password for sysadmin:
● nagios.service - Nagios
   Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2018-07-27 17:46:13 EDT; 1min 54s ago
  Process: 957 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=254)
Main PID: 957 (code=exited, status=254)

Jul 27 17:46:13 monitoring systemd[1]: Started Nagios.
Jul 27 17:46:13 monitoring nagios[957]: Failed to obtain lock on file /run/nagios.lock: Permission denied
Jul 27 17:46:13 monitoring nagios[957]: Bailing out due to errors encountered while attempting to daemonize... (PID=957)
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Main process exited, code=exited, status=254/n/a
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Unit entered failed state.
Jul 27 17:46:13 monitoring systemd[1]: nagios.service: Failed with result 'exit-code'.


Here is a listing of the perms on that directory:

Code: Select all
sysadmin@monitoring:~$ ls /usr/local/nagios/bin/ -la
total 776
drwxrwxr-x 2 nagios nagios   4096 Jul 26 16:50 .
drwxr-xr-x 9 nagios nagios   4096 Jul 11 12:10 ..
-rwxrwxr-- 1 nagios nagios 729560 Jul 26 16:50 nagios
-rwxrwxr-- 1 nagios nagios  39648 Jul 26 16:50 nagiostats
-rwxr-xr-x 1 nagios nagios  10675 Jul 11 12:06 nrpe-uninstall


The /run directory is owned by root:root - not sure why it can't create a lock file.

I then tried this:
Code: Select all
sysadmin@monitoring:~$ ps -ef|grep nagios.cfg
sysadmin  1823  1804  0 17:48 pts/0    00:00:00 grep --color=auto nagios.cfg



Here are the stats:

Code: Select all
Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-06-25
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 549 services.
        Checked 98 hosts.
        Checked 29 host groups.
        Checked 3 service groups.
        Checked 1 contacts.
        Checked 1 contact groups.
        Checked 28 commands.
        Checked 7 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 98 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 7 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check


As for the hardware, I believe it's 4 cores. Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz.
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Postby scottwilkerson » Mon Jul 30, 2018 7:27 am

lets edit your nagios.cfg

change this
Code: Select all
lock_file=/var/run/nagios.lock


to this
Code: Select all
lock_file=/usr/local/nagios/var/nagios.lock


Then try the restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12594
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Mon Jul 30, 2018 8:58 am

Thanks. That helped get Nagios running. I have attached screen shots of where we are now including performance stats. Response times are still pretty slow on the UI. I don't see nagios in the top results anymore. Looking at top, the CPU is mostly idle and swap memory isn't being used until you try to load a page on the UI. I had top running when I clicked on the Nagios Service Problems link on the left. It took 1m 15s to load the page. I attached a video in real time of the top running so you can see what is happening from that standpoint. Any ideas?

load_historical.png
Historical Load


load_last24.png
Load last 24 hours


ram_historical.png
RAM historical
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Mon Jul 30, 2018 9:00 am

I couldn't upload a 4th file in the last post. Here is the video.
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

Re: Nagios process at 100% CPU, system is crawling

Postby scottwilkerson » Mon Jul 30, 2018 9:34 am

Video didn't upload.

Are you seeing any errors in the httpd error_log?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12594
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios process at 100% CPU, system is crawling

Postby consulvation » Mon Jul 30, 2018 10:49 am

It's an MP4 file which is not allowed on this board. Not sure which format you would like.

From /var/log/apache2/error.log:

Code: Select all
cat error.log
[Mon Jul 30 06:25:01.930424 2018] [mpm_prefork:notice] [pid 1212] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Mon Jul 30 06:25:01.930448 2018] [core:notice] [pid 1212] AH00094: Command line: '/usr/sbin/apache2'


Looking at the access.log file, it's all 200 OK responses, nothing unusual.
consulvation
 
Posts: 16
Joined: Thu Jul 19, 2018 10:07 pm

PreviousNext

Return to Nagios Core

Who is online

Users browsing this forum: Ravikimt, shamrozkadiwal and 38 guests