Page 1 of 4
Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Fri Apr 06, 2018 3:09 pm
by emartine
Our test nagios environment that is only checking about 100 services seems to be running slow after the upgrade we did a few weeks back.
We went from 5.4.11 to 5.4.13.
At this time the server is no longer running gearman and is being used as a standalone XI server.
Stats currently in use:
RAM 5G Total / 3G in use
2 x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHzq
02:50:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
02:50:08 PM all 8.14 0.00 1.76 0.32 0.00 0.87 0.00 0.00 88.90
Most of the cpu usage ranges around 10% by httpd but it is not constant and nothing higher than that.
I'm seeing a lot of these processes.
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8728 8718 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 8760 19605 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
I've looked at /var/log/messages and I've not noticed anything out of the ordinary.
I don't want this to happen in production so I would like to trouble shoot this issue and find the culprit of the slowness.
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Fri Apr 06, 2018 3:31 pm
by scottwilkerson
can you show the full output of the following
If there is more than one we should do the following
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Mon Apr 09, 2018 9:55 am
by emartine
I've done as you suggested below but I have also rebooted the server.
ps -ef | grep nagios.cfg
nagios 1313 1 0 Apr04 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1509 1313 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9000 8718 0 Apr04 ? 00:00:17 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13545 1 0 Apr04 ? 00:00:09 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13759 13545 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15351 1 0 Apr08 ? 00:01:36 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15434 15351 0 Apr08 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16449 1 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16996 1 0 Apr04 ? 00:01:30 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 17325 16996 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 22897 1 0 Apr06 ? 00:00:19 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 23239 22897 0 Apr06 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24140 1 0 Apr06 ? 00:00:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24625 24140 0 Apr06 ? 00:00:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30466 1 0 Apr04 ? 00:00:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30786 30466 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32313 1 0 Apr03 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32733 32313 0 Apr03 ? 00:00:21 /usr/local/nagios/bin/nagios -d
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Mon Apr 09, 2018 12:19 pm
by scottwilkerson
You have multiple nagios parent processes, run the following
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
This should speed things up
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Mon Apr 09, 2018 4:55 pm
by emartine
I think killing them is a temporary solution. I'm reviewing the items since I last killed them:
nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6061 5857 0 16:12 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6348 5811 0 09:45 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Mon Apr 09, 2018 5:01 pm
by scottwilkerson
It really should never happen, but this is wrong, you should NEVER have 2 nagios parent processes
Code: Select all
nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
and as such, they will compete with each other causing the amount of checks that run to possibly double
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Tue Apr 10, 2018 5:25 am
by elinagios
Hello
Scott, are you sure about that?
I have 3 environments, live 5.4.10, test 5.4.13 and environment that is monitoring live, also 5.4.13. In all cases there are 2 nagios parent processes:
nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
I restarted test and live monitoring server and even after server restart there are 2 instances. If i restart nagios, then still after that there are 2 instances. To me it seems that is how it suppose to be.
All are running Centos 7.4
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Tue Apr 10, 2018 8:14 am
by scottwilkerson
elinagios wrote:Scott, are you sure about that?
Yes.
In what I posted the
3rd column is the parent process and they are both 1
in your example, this is 1 parent process and the bottom oneis a child process of the parent pid 3320
elinagios wrote:nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Tue Apr 10, 2018 9:57 am
by emartine
So how can I tell why the other processes aren't dying.
I am assuming new ones spawn when changes are applied and the old ones are supposed to die but don't.
In our production environment I see that we have 3 parents.
nagios 36472 1 0 Mar19 ? 00:14:57 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 36609 36472 0 Mar19 ? 00:01:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41292 1 8 18:14 ? 00:00:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41425 41292 0 18:14 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62726 1 0 Feb22 ? 00:00:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62873 62726 0 Feb22 ? 00:03:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
So Mar19 and Feb22 need to die.
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Posted: Tue Apr 10, 2018 10:02 am
by scottwilkerson
that is odd to see regularly, and you should kill them off.
Do you run mod_gearman? I've seen this cause them to stay open in the past.