Page 3 of 4
Re: Hosts and services temporarily unavailable
Posted: Fri Aug 16, 2019 8:13 am
by drug
Is it a requirement to run this upgrade script after
every Nagios XI update or just updates that contain newer versions of Core? We last upgraded gearman per those instructions after updating to 5.5.9 which included Nagios Core 4.4.3.
Re: Hosts and services temporarily unavailable
Posted: Fri Aug 16, 2019 2:26 pm
by ssax
Core 4.4.3 was incompatible with Gearman and required downgrade Core to 4.4.2 to work if you were running gearman. Now that you're on 4.4.3 AND you run gearman, you NEED to upgrade gearman (not usually the case, it was an incompatibility and was resolved in the latest gearman, hence the upgrade being required).
Let me know if you have any questions or if I can clarify anything.
Re: Hosts and services temporarily unavailable
Posted: Wed Aug 21, 2019 4:12 pm
by drug
The changelog within *ModGearmanInstall.sh* hasn't been updated since after we ran the update last. Although I did attempt to upgrade today during a maintenance window with
Code: Select all
./ModGearmanInstall.sh --type=server --upgrade
I received the following error:
Code: Select all
*******************************************************************************
ERROR
Package gearmand was detected prior to first-time installation. Did you remove your Mod Gearman 2 packages?
Please remove the following packages: gearmand gearmand-devel mod_gearman mod_gearman-debuginfo gearmand-server
*******************************************************************************
This is a bit confusing since I'm running an upgrade (not an installation). Do these packages really have to be removed first?
Re: Hosts and services temporarily unavailable
Posted: Thu Aug 22, 2019 3:54 pm
by benjaminsmith
Hi,
Did you follow the upgrade instructions ( see: Server Installation – Upgrade (2 => 3) ). You'll need to remove Mod Gearman 2 and then install Mod Gearman 3.
Remove Gearman 2
Code: Select all
# Remove Mod Gearman 2
cp /etc/mod_gearman2/* /tmp/
yum remove gearmand gearmand-server gearmand-debuginfo gearmand-devel mod_gearman2 -y
sed -i 's/^broker\(.*\)gearman2\(.*\)/#broker\1gearman2\2/' /usr/local/nagios/etc/nagios.cfg
Install Gearman 3
Code: Select all
# Download and install Mod Gearman 3
cd /tmp
wget https://assets.nagios.com/downloads/nagiosxi/scripts/ModGearmanInstall.sh
chmod +x ModGearmanInstall.sh
./ModGearmanInstall.sh --type=server
See:
Integrating Mod-Gearman With Nagios XI
Re: Hosts and services temporarily unavailable
Posted: Thu Aug 22, 2019 4:25 pm
by drug
We upgraded to Gearman 3 some time ago. There are no installations of (or configurations that reference) version 2 on this server.
Re: Hosts and services temporarily unavailable
Posted: Fri Aug 23, 2019 10:28 am
by bheden
@drug I'm the original author of that script - and I'll be taking a look at it in depth. Currently, I'm reading through this post attempting to find something maybe/hopefully obvious that wasn't already pointed out.
What are your XI and db and modgearman workers system specifications? CPU count, memory size. What type of disks? Is it virtual? Which hypervisor, if so?
How many modgearman workers do you have? What do those configurations look like?
Can you enable ndo2db debugging and perhaps supply us with that output?
My apologies if any of this has been repeated to you, I'm a bit late to the game

Re: Hosts and services temporarily unavailable
Posted: Mon Aug 26, 2019 3:35 pm
by ssax
Is mod_gearman currently processing checks as it should?
What is the gearman_top output now? Do you see things being processed if you watch it?
If gearman is working as normal, you don't likely have a problem with gearman.
If it's only 30 seconds after an apply configuration, that's normal according to the devs:
The devs said that after apply configuration it can take up to 3 minutes after the apply config for the NDOUtils rebuild/update process to complete, at that point it starts the CCM permission building process (unrelated here but good to know) as the CCM permissions are built off of the NDOUtils permissions and those need to be ready.
Answer bhenden's stuff from his post above as well.
Re: Hosts and services temporarily unavailable
Posted: Tue Aug 27, 2019 8:48 am
by drug
bheden wrote:
What are your XI and db and modgearman workers system specifications? CPU count, memory size. What type of disks? Is it virtual? Which hypervisor, if so?
XI: 8-core CPU, 8GB RAM (VMware VM)
Workers: It varies, some use ARM and have only 512M of RAM but our primaries are VMware 4-core CPU, 4GB RAM.
bheden wrote:
How many modgearman workers do you have? What do those configurations look like?
We have ~20 nodes; each averaging the ability to exec ~150 workers at any given time. Each is configured to only to execute jobs for specific hostgroups.
We haven't had any issues with gearmand or our workers keeping up with checks.
bheden wrote:
Can you enable ndo2db debugging and perhaps supply us with that output?
Sure, I will accumulate some information and PM to you.
bheden wrote:
My apologies if any of this has been repeated to you, I'm a bit late to the game
No worries, your help is appreciated!
Re: Hosts and services temporarily unavailable
Posted: Tue Aug 27, 2019 8:55 am
by drug
ssax wrote:Is mod_gearman currently processing checks as it should?
Yes
ssax wrote:
What is the gearman_top output now? Do you see things being processed if you watch it?
Yes, we don't have any issues with gearmand or the mod gearman nodes. All checks are being processed in a timely fashion as they should be (no waiting and averaging ~.5 second latency for service checks).
ssax wrote:I
If it's only 30 seconds after an apply configuration, that's normal according to the devs:
The devs said that after apply configuration it can take up to 3 minutes after the apply config for the NDOUtils rebuild/update process to complete, at that point it starts the CCM permission building process (unrelated here but good to know) as the CCM permissions are built off of the NDOUtils permissions and those need to be ready.
I think this speaks directly to the problem and based on their comment it sounds like that the answer to my initial question is that
this is normal and (short of an architecture/software design change) the only option for us would be increase the resources on the XI instance and associated database in order to shorten the window during which the XI interface is unavailable?
Re: Hosts and services temporarily unavailable
Posted: Tue Aug 27, 2019 4:15 pm
by ssax
this is normal and (short of an architecture/software design change) the only option for us would be increase the resources on the XI instance and associated database in order to shorten the window during which the XI interface is unavailable?
That is correct, I only know this because I had the very discussion with the head of development when working a bug in the CCMs permissions building and until NDOUtils is removed from the architecture, you can count on this being the case:
Expect up to a 3 minute delay after an Apply Configuration (at the minimum)
depending on how many objects/records in the DB/total checks/active vs inactive/load/DB offloaded/ etc, they all have an impact