Nagios XI Event queue stalling? (Mod Gearman)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by mon-team »

Hi everybody,

just to share that the orphaned checks issue due to mod_gearman stalling happened again last Friday, May 16.
We are going to upgrade to mod_gearman v.1.4.14 in our production environment tomorrow.
We will let nagios run for some time and will share here if the issue will be solved after the upgrade.
I really think that the gearman installation script should be updated also for Nagios 2012, as the issue has a great impact.
lance
Posts: 38
Joined: Wed Feb 17, 2010 5:00 pm

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by lance »

Seems to be a bit of activity here...

Hey mon-team (& mods..) Can confirm that I haven't seen the same behaviour since we upgraded our gearman deployment mid march. We have had one instance a couple of days ago where the monitoring engine queue filled up, but put that down to the gearman neb module not loading for some reason on a Nagios restart post config change.. Unfortunately we couldn't find the root cause, But did see some mod_gearman check errors in Nagios with the local gearman worker complaining about not being able to connect to localhost:4730. Was fixed with a service gearmand restart followed by a service nagios restart.
Still not 100% sure if it was the upgrade, some of the suggested config changes, moving virtual machines to low I/O disk or combination of everything...? We still get a bit of a spike in I/O wait overnight, where the configured service & worker queues take a bit of a hit (i.e. jobs waiting), but have found that the workers catch up ok..

Anyway - good luck with the upgrade

Lincoln
Last edited by lance on Fri May 23, 2014 5:02 pm, edited 1 time in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by scottwilkerson »

@mon-team - Let us know how the upgrade goes. We can look at putting together a special script for those using the 2012 version.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by BanditBBS »

My question is going to be, what is going to happen when they upgrade to the 2014 version which then requires the 1.4.0-nagios4 version. They would technically be downgrading their gearman again since 1.4.14 hasn't been compiled for nagios4 yet(That I can find).
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by slansing »

We're currently evaluating options as far as your comment goes Bandit, unfortunately at this moment this is what we can offer to keep mod_gearman running and working in environments which are upgraded to Core4/2014.
lance
Posts: 38
Joined: Wed Feb 17, 2010 5:00 pm

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by lance »

Should have confirmed, we're still on XI 2012R2.9
User avatar
lmiltchev
Former Nagios Staff
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by lmiltchev »

Note: I will keep this topic open for a while.
Be sure to check out our Knowledgebase for helpful articles and solutions!
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by vAJ »

So what's the current consensus on upgraded mod_gearman alongside the upgrade to XI2014?

The script posted states: "This script should only be used on a Redhat or CentOS Installation with no previous verion of Mod Gearman Installed"

So, should I remove the current version of gearman I have running or can this script OK to handle the upgrade? Also, should gearman be upgraded before or after the XI upgrade?

Sorry if this is cross-posting.
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by slansing »

In this case you should be able to run it on an already installed system, that is because it will simply update with the newer rpm's. It is not supposed to be used as a re-installation method is why that was noted there. At this time we still have not gotten through all of the testing required to update the live document but the method on page 2 still stands.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios XI Event queue stalling? (Mod Gearman)

Post by rajasegar »

slansing wrote:Dropping this in a new reply since some of you subscribed to this thread. You can use the attached script to get a working version of gearmand on your XI 2014 / Core 4 server, and it can also be used to get a working worker install on you're remote workers. I also added in the option for you to specify a key (during install) if you are using encryption on the master server, so you don't have to manually go in and add just that.

You can follow the same old document, though I'll be updating that as well once support is added in to check your server for Core4 so we don't break other peoples installs if they are using core 3 still and want gearman.

NOTE: If you did not know already, you can not use the perfdata processing function of mod gearman as we are not shipping the version of PNP that it requires you to install remotely, we actually don't really use PNP heavily on the backend of XI anymore.

Currently, until we find a better way, you will also need to update your performance data commands in the CCM in order to get perf data pushed from the workers up to XI and displayed properly.

You will need to change-

process-host-perfdata-file-bulk and process-service-perfdata-file-bulk's command's to:

Code: Select all

sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata && /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
And:

Code: Select all

sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata && /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
And apply configuration.
Please confirm the following.
For fresh Gearman install
1) Run ModGearmanFullinstallVersion.sh
2) Change process-host-perfdata-file-bulk and process-service-perfdata-file-bulk's commands

Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked