Slowness troubleshooting --> 5.4.11 to 5.4.13.
Slowness troubleshooting --> 5.4.11 to 5.4.13.
Our test nagios environment that is only checking about 100 services seems to be running slow after the upgrade we did a few weeks back.
We went from 5.4.11 to 5.4.13.
At this time the server is no longer running gearman and is being used as a standalone XI server.
Stats currently in use:
RAM 5G Total / 3G in use
2 x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHzq
02:50:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
02:50:08 PM all 8.14 0.00 1.76 0.32 0.00 0.87 0.00 0.00 88.90
Most of the cpu usage ranges around 10% by httpd but it is not constant and nothing higher than that.
I'm seeing a lot of these processes.
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8728 8718 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 8760 19605 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
I've looked at /var/log/messages and I've not noticed anything out of the ordinary.
I don't want this to happen in production so I would like to trouble shoot this issue and find the culprit of the slowness.
We went from 5.4.11 to 5.4.13.
At this time the server is no longer running gearman and is being used as a standalone XI server.
Stats currently in use:
RAM 5G Total / 3G in use
2 x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHzq
02:50:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
02:50:08 PM all 8.14 0.00 1.76 0.32 0.00 0.87 0.00 0.00 88.90
Most of the cpu usage ranges around 10% by httpd but it is not constant and nothing higher than that.
I'm seeing a lot of these processes.
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8728 8718 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 8760 19605 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
I've looked at /var/log/messages and I've not noticed anything out of the ordinary.
I don't want this to happen in production so I would like to trouble shoot this issue and find the culprit of the slowness.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
can you show the full output of the following
If there is more than one we should do the following
Code: Select all
ps -ef|grep nagios.cfgCode: Select all
service nagios stop
killall -9 nagios
service nagios startRe: Slowness troubleshooting --> 5.4.11 to 5.4.13.
I've done as you suggested below but I have also rebooted the server.
ps -ef | grep nagios.cfg
nagios 1313 1 0 Apr04 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1509 1313 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9000 8718 0 Apr04 ? 00:00:17 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13545 1 0 Apr04 ? 00:00:09 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13759 13545 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15351 1 0 Apr08 ? 00:01:36 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15434 15351 0 Apr08 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16449 1 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16996 1 0 Apr04 ? 00:01:30 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 17325 16996 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 22897 1 0 Apr06 ? 00:00:19 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 23239 22897 0 Apr06 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24140 1 0 Apr06 ? 00:00:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24625 24140 0 Apr06 ? 00:00:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30466 1 0 Apr04 ? 00:00:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30786 30466 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32313 1 0 Apr03 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32733 32313 0 Apr03 ? 00:00:21 /usr/local/nagios/bin/nagios -d
ps -ef | grep nagios.cfg
nagios 1313 1 0 Apr04 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1509 1313 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9000 8718 0 Apr04 ? 00:00:17 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13545 1 0 Apr04 ? 00:00:09 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13759 13545 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15351 1 0 Apr08 ? 00:01:36 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15434 15351 0 Apr08 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16449 1 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16996 1 0 Apr04 ? 00:01:30 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 17325 16996 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 22897 1 0 Apr06 ? 00:00:19 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 23239 22897 0 Apr06 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24140 1 0 Apr06 ? 00:00:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24625 24140 0 Apr06 ? 00:00:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30466 1 0 Apr04 ? 00:00:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30786 30466 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32313 1 0 Apr03 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32733 32313 0 Apr03 ? 00:00:21 /usr/local/nagios/bin/nagios -d
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
You have multiple nagios parent processes, run the following
This should speed things up
Code: Select all
service nagios stop
killall -9 nagios
service nagios startRe: Slowness troubleshooting --> 5.4.11 to 5.4.13.
I think killing them is a temporary solution. I'm reviewing the items since I last killed them:
nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6061 5857 0 16:12 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6348 5811 0 09:45 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6061 5857 0 16:12 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6348 5811 0 09:45 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
It really should never happen, but this is wrong, you should NEVER have 2 nagios parent processes
and as such, they will compete with each other causing the amount of checks that run to possibly double
Code: Select all
nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfgRe: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Hello
Scott, are you sure about that?
I have 3 environments, live 5.4.10, test 5.4.13 and environment that is monitoring live, also 5.4.13. In all cases there are 2 nagios parent processes:
nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
I restarted test and live monitoring server and even after server restart there are 2 instances. If i restart nagios, then still after that there are 2 instances. To me it seems that is how it suppose to be.
All are running Centos 7.4
Scott, are you sure about that?
I have 3 environments, live 5.4.10, test 5.4.13 and environment that is monitoring live, also 5.4.13. In all cases there are 2 nagios parent processes:
nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
I restarted test and live monitoring server and even after server restart there are 2 instances. If i restart nagios, then still after that there are 2 instances. To me it seems that is how it suppose to be.
All are running Centos 7.4
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
Yes.elinagios wrote:Scott, are you sure about that?
In what I posted the 3rd column is the parent process and they are both 1
in your example, this is 1 parent process and the bottom oneis a child process of the parent pid 3320
elinagios wrote:nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
So how can I tell why the other processes aren't dying.
I am assuming new ones spawn when changes are applied and the old ones are supposed to die but don't.
In our production environment I see that we have 3 parents.
nagios 36472 1 0 Mar19 ? 00:14:57 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 36609 36472 0 Mar19 ? 00:01:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41292 1 8 18:14 ? 00:00:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41425 41292 0 18:14 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62726 1 0 Feb22 ? 00:00:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62873 62726 0 Feb22 ? 00:03:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
So Mar19 and Feb22 need to die.
I am assuming new ones spawn when changes are applied and the old ones are supposed to die but don't.
In our production environment I see that we have 3 parents.
nagios 36472 1 0 Mar19 ? 00:14:57 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 36609 36472 0 Mar19 ? 00:01:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41292 1 8 18:14 ? 00:00:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41425 41292 0 18:14 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62726 1 0 Feb22 ? 00:00:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62873 62726 0 Feb22 ? 00:03:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
So Mar19 and Feb22 need to die.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.
that is odd to see regularly, and you should kill them off.
Do you run mod_gearman? I've seen this cause them to stay open in the past.
Do you run mod_gearman? I've seen this cause them to stay open in the past.