Slowness troubleshooting --> 5.4.11 to 5.4.13.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

Our test nagios environment that is only checking about 100 services seems to be running slow after the upgrade we did a few weeks back.
We went from 5.4.11 to 5.4.13.

At this time the server is no longer running gearman and is being used as a standalone XI server.

Stats currently in use:

RAM 5G Total / 3G in use
2 x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHzq

02:50:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
02:50:08 PM all 8.14 0.00 1.76 0.32 0.00 0.87 0.00 0.00 88.90

Most of the cpu usage ranges around 10% by httpd but it is not constant and nothing higher than that.

I'm seeing a lot of these processes.

nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8728 8718 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 8760 19605 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg


I've looked at /var/log/messages and I've not noticed anything out of the ordinary.

I don't want this to happen in production so I would like to trouble shoot this issue and find the culprit of the slowness.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

can you show the full output of the following

Code: Select all

ps -ef|grep nagios.cfg
If there is more than one we should do the following

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

I've done as you suggested below but I have also rebooted the server.

ps -ef | grep nagios.cfg
nagios 1313 1 0 Apr04 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1509 1313 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8718 1 0 Apr04 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9000 8718 0 Apr04 ? 00:00:17 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13545 1 0 Apr04 ? 00:00:09 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 13759 13545 0 Apr04 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15351 1 0 Apr08 ? 00:01:36 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15434 15351 0 Apr08 ? 00:00:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16449 1 0 Apr04 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 16996 1 0 Apr04 ? 00:01:30 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 17325 16996 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 22897 1 0 Apr06 ? 00:00:19 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 23239 22897 0 Apr06 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24140 1 0 Apr06 ? 00:00:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24625 24140 0 Apr06 ? 00:00:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30466 1 0 Apr04 ? 00:00:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 30786 30466 0 Apr04 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32313 1 0 Apr03 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 32733 32313 0 Apr03 ? 00:00:21 /usr/local/nagios/bin/nagios -d
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

You have multiple nagios parent processes, run the following

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
This should speed things up
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

I think killing them is a temporary solution. I'm reviewing the items since I last killed them:

nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6061 5857 0 16:12 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6348 5811 0 09:45 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

It really should never happen, but this is wrong, you should NEVER have 2 nagios parent processes

Code: Select all

nagios 5811 1 0 09:43 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5857 1 0 16:11 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
and as such, they will compete with each other causing the amount of checks that run to possibly double
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
elinagios
Posts: 146
Joined: Thu Feb 16, 2017 3:45 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by elinagios »

Hello

Scott, are you sure about that?
I have 3 environments, live 5.4.10, test 5.4.13 and environment that is monitoring live, also 5.4.13. In all cases there are 2 nagios parent processes:
nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

I restarted test and live monitoring server and even after server restart there are 2 instances. If i restart nagios, then still after that there are 2 instances. To me it seems that is how it suppose to be.

All are running Centos 7.4
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

elinagios wrote:Scott, are you sure about that?
Yes.

In what I posted the 3rd column is the parent process and they are both 1

in your example, this is 1 parent process and the bottom oneis a child process of the parent pid 3320
elinagios wrote:nagios 3320 1 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3339 3320 0 13:19 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

So how can I tell why the other processes aren't dying.

I am assuming new ones spawn when changes are applied and the old ones are supposed to die but don't.

In our production environment I see that we have 3 parents.

nagios 36472 1 0 Mar19 ? 00:14:57 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 36609 36472 0 Mar19 ? 00:01:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41292 1 8 18:14 ? 00:00:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 41425 41292 0 18:14 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62726 1 0 Feb22 ? 00:00:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 62873 62726 0 Feb22 ? 00:03:07 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


So Mar19 and Feb22 need to die.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

that is odd to see regularly, and you should kill them off.

Do you run mod_gearman? I've seen this cause them to stay open in the past.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked