Hosts and services temporarily unavailable

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
drug
Posts: 86
Joined: Wed Apr 03, 2013 3:19 pm

Re: Hosts and services temporarily unavailable

Post by drug »

ssax wrote:If you are running Core 4.4.3 (which you are), you are REQUIRED to upgrade gearman server on XI server and gearman workers.

https://assets.nagios.com/downloads/nag ... ios_XI.pdf
Is it a requirement to run this upgrade script after every Nagios XI update or just updates that contain newer versions of Core? We last upgraded gearman per those instructions after updating to 5.5.9 which included Nagios Core 4.4.3.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Hosts and services temporarily unavailable

Post by ssax »

Core 4.4.3 was incompatible with Gearman and required downgrade Core to 4.4.2 to work if you were running gearman. Now that you're on 4.4.3 AND you run gearman, you NEED to upgrade gearman (not usually the case, it was an incompatibility and was resolved in the latest gearman, hence the upgrade being required).

Let me know if you have any questions or if I can clarify anything.
drug
Posts: 86
Joined: Wed Apr 03, 2013 3:19 pm

Re: Hosts and services temporarily unavailable

Post by drug »

The changelog within *ModGearmanInstall.sh* hasn't been updated since after we ran the update last. Although I did attempt to upgrade today during a maintenance window with

Code: Select all

./ModGearmanInstall.sh --type=server --upgrade
I received the following error:

Code: Select all

*******************************************************************************
ERROR
Package gearmand was detected prior to first-time installation. Did you remove your Mod Gearman 2 packages?
Please remove the following packages: gearmand gearmand-devel mod_gearman mod_gearman-debuginfo gearmand-server

*******************************************************************************
This is a bit confusing since I'm running an upgrade (not an installation). Do these packages really have to be removed first?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Hosts and services temporarily unavailable

Post by benjaminsmith »

Hi,

Did you follow the upgrade instructions ( see: Server Installation – Upgrade (2 => 3) ). You'll need to remove Mod Gearman 2 and then install Mod Gearman 3.

Remove Gearman 2

Code: Select all

# Remove Mod Gearman 2
cp /etc/mod_gearman2/* /tmp/
yum remove gearmand gearmand-server gearmand-debuginfo gearmand-devel mod_gearman2 -y
sed -i 's/^broker\(.*\)gearman2\(.*\)/#broker\1gearman2\2/' /usr/local/nagios/etc/nagios.cfg
Install Gearman 3

Code: Select all

# Download and install Mod Gearman 3
cd /tmp
wget https://assets.nagios.com/downloads/nagiosxi/scripts/ModGearmanInstall.sh
chmod +x ModGearmanInstall.sh
./ModGearmanInstall.sh --type=server
See: Integrating Mod-Gearman With Nagios XI
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
drug
Posts: 86
Joined: Wed Apr 03, 2013 3:19 pm

Re: Hosts and services temporarily unavailable

Post by drug »

We upgraded to Gearman 3 some time ago. There are no installations of (or configurations that reference) version 2 on this server.
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Hosts and services temporarily unavailable

Post by bheden »

@drug I'm the original author of that script - and I'll be taking a look at it in depth. Currently, I'm reading through this post attempting to find something maybe/hopefully obvious that wasn't already pointed out.

What are your XI and db and modgearman workers system specifications? CPU count, memory size. What type of disks? Is it virtual? Which hypervisor, if so?

How many modgearman workers do you have? What do those configurations look like?

Can you enable ndo2db debugging and perhaps supply us with that output?

My apologies if any of this has been repeated to you, I'm a bit late to the game :)
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Hosts and services temporarily unavailable

Post by ssax »

Is mod_gearman currently processing checks as it should?

What is the gearman_top output now? Do you see things being processed if you watch it?

If gearman is working as normal, you don't likely have a problem with gearman.


If it's only 30 seconds after an apply configuration, that's normal according to the devs:

The devs said that after apply configuration it can take up to 3 minutes after the apply config for the NDOUtils rebuild/update process to complete, at that point it starts the CCM permission building process (unrelated here but good to know) as the CCM permissions are built off of the NDOUtils permissions and those need to be ready.

Answer bhenden's stuff from his post above as well.
drug
Posts: 86
Joined: Wed Apr 03, 2013 3:19 pm

Re: Hosts and services temporarily unavailable

Post by drug »

bheden wrote: What are your XI and db and modgearman workers system specifications? CPU count, memory size. What type of disks? Is it virtual? Which hypervisor, if so?
XI: 8-core CPU, 8GB RAM (VMware VM)
Workers: It varies, some use ARM and have only 512M of RAM but our primaries are VMware 4-core CPU, 4GB RAM.
bheden wrote: How many modgearman workers do you have? What do those configurations look like?
We have ~20 nodes; each averaging the ability to exec ~150 workers at any given time. Each is configured to only to execute jobs for specific hostgroups.

We haven't had any issues with gearmand or our workers keeping up with checks.
bheden wrote: Can you enable ndo2db debugging and perhaps supply us with that output?
Sure, I will accumulate some information and PM to you.
bheden wrote: My apologies if any of this has been repeated to you, I'm a bit late to the game :)
No worries, your help is appreciated!
drug
Posts: 86
Joined: Wed Apr 03, 2013 3:19 pm

Re: Hosts and services temporarily unavailable

Post by drug »

ssax wrote:Is mod_gearman currently processing checks as it should?
Yes
ssax wrote: What is the gearman_top output now? Do you see things being processed if you watch it?
Yes, we don't have any issues with gearmand or the mod gearman nodes. All checks are being processed in a timely fashion as they should be (no waiting and averaging ~.5 second latency for service checks).
ssax wrote:I
If it's only 30 seconds after an apply configuration, that's normal according to the devs:

The devs said that after apply configuration it can take up to 3 minutes after the apply config for the NDOUtils rebuild/update process to complete, at that point it starts the CCM permission building process (unrelated here but good to know) as the CCM permissions are built off of the NDOUtils permissions and those need to be ready.
I think this speaks directly to the problem and based on their comment it sounds like that the answer to my initial question is that this is normal and (short of an architecture/software design change) the only option for us would be increase the resources on the XI instance and associated database in order to shorten the window during which the XI interface is unavailable?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Hosts and services temporarily unavailable

Post by ssax »

this is normal and (short of an architecture/software design change) the only option for us would be increase the resources on the XI instance and associated database in order to shorten the window during which the XI interface is unavailable?
That is correct, I only know this because I had the very discussion with the head of development when working a bug in the CCMs permissions building and until NDOUtils is removed from the architecture, you can count on this being the case:

Expect up to a 3 minute delay after an Apply Configuration (at the minimum)

depending on how many objects/records in the DB/total checks/active vs inactive/load/DB offloaded/ etc, they all have an impact
Locked