Nagios XI Event queue stalling? (Mod Gearman)
Re: Nagios XI Event queue stalling? (Mod Gearman)
Hi everybody,
just to share that the orphaned checks issue due to mod_gearman stalling happened again last Friday, May 16.
We are going to upgrade to mod_gearman v.1.4.14 in our production environment tomorrow.
We will let nagios run for some time and will share here if the issue will be solved after the upgrade.
I really think that the gearman installation script should be updated also for Nagios 2012, as the issue has a great impact.
just to share that the orphaned checks issue due to mod_gearman stalling happened again last Friday, May 16.
We are going to upgrade to mod_gearman v.1.4.14 in our production environment tomorrow.
We will let nagios run for some time and will share here if the issue will be solved after the upgrade.
I really think that the gearman installation script should be updated also for Nagios 2012, as the issue has a great impact.
Re: Nagios XI Event queue stalling? (Mod Gearman)
Seems to be a bit of activity here...
Hey mon-team (& mods..) Can confirm that I haven't seen the same behaviour since we upgraded our gearman deployment mid march. We have had one instance a couple of days ago where the monitoring engine queue filled up, but put that down to the gearman neb module not loading for some reason on a Nagios restart post config change.. Unfortunately we couldn't find the root cause, But did see some mod_gearman check errors in Nagios with the local gearman worker complaining about not being able to connect to localhost:4730. Was fixed with a service gearmand restart followed by a service nagios restart.
Still not 100% sure if it was the upgrade, some of the suggested config changes, moving virtual machines to low I/O disk or combination of everything...? We still get a bit of a spike in I/O wait overnight, where the configured service & worker queues take a bit of a hit (i.e. jobs waiting), but have found that the workers catch up ok..
Anyway - good luck with the upgrade
Lincoln
Hey mon-team (& mods..) Can confirm that I haven't seen the same behaviour since we upgraded our gearman deployment mid march. We have had one instance a couple of days ago where the monitoring engine queue filled up, but put that down to the gearman neb module not loading for some reason on a Nagios restart post config change.. Unfortunately we couldn't find the root cause, But did see some mod_gearman check errors in Nagios with the local gearman worker complaining about not being able to connect to localhost:4730. Was fixed with a service gearmand restart followed by a service nagios restart.
Still not 100% sure if it was the upgrade, some of the suggested config changes, moving virtual machines to low I/O disk or combination of everything...? We still get a bit of a spike in I/O wait overnight, where the configured service & worker queues take a bit of a hit (i.e. jobs waiting), but have found that the workers catch up ok..
Anyway - good luck with the upgrade
Lincoln
Last edited by lance on Fri May 23, 2014 5:02 pm, edited 1 time in total.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios XI Event queue stalling? (Mod Gearman)
@mon-team - Let us know how the upgrade goes. We can look at putting together a special script for those using the 2012 version.
Re: Nagios XI Event queue stalling? (Mod Gearman)
My question is going to be, what is going to happen when they upgrade to the 2014 version which then requires the 1.4.0-nagios4 version. They would technically be downgrading their gearman again since 1.4.14 hasn't been compiled for nagios4 yet(That I can find).
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios XI Event queue stalling? (Mod Gearman)
We're currently evaluating options as far as your comment goes Bandit, unfortunately at this moment this is what we can offer to keep mod_gearman running and working in environments which are upgraded to Core4/2014.
Re: Nagios XI Event queue stalling? (Mod Gearman)
Should have confirmed, we're still on XI 2012R2.9
Re: Nagios XI Event queue stalling? (Mod Gearman)
Note: I will keep this topic open for a while.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI Event queue stalling? (Mod Gearman)
So what's the current consensus on upgraded mod_gearman alongside the upgrade to XI2014?
The script posted states: "This script should only be used on a Redhat or CentOS Installation with no previous verion of Mod Gearman Installed"
So, should I remove the current version of gearman I have running or can this script OK to handle the upgrade? Also, should gearman be upgraded before or after the XI upgrade?
Sorry if this is cross-posting.
The script posted states: "This script should only be used on a Redhat or CentOS Installation with no previous verion of Mod Gearman Installed"
So, should I remove the current version of gearman I have running or can this script OK to handle the upgrade? Also, should gearman be upgraded before or after the XI upgrade?
Sorry if this is cross-posting.
Andrew J. - Do you even grok?
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios XI Event queue stalling? (Mod Gearman)
In this case you should be able to run it on an already installed system, that is because it will simply update with the newer rpm's. It is not supposed to be used as a re-installation method is why that was noted there. At this time we still have not gotten through all of the testing required to update the live document but the method on page 2 still stands.
Re: Nagios XI Event queue stalling? (Mod Gearman)
Please confirm the following.slansing wrote:Dropping this in a new reply since some of you subscribed to this thread. You can use the attached script to get a working version of gearmand on your XI 2014 / Core 4 server, and it can also be used to get a working worker install on you're remote workers. I also added in the option for you to specify a key (during install) if you are using encryption on the master server, so you don't have to manually go in and add just that.
You can follow the same old document, though I'll be updating that as well once support is added in to check your server for Core4 so we don't break other peoples installs if they are using core 3 still and want gearman.
NOTE: If you did not know already, you can not use the perfdata processing function of mod gearman as we are not shipping the version of PNP that it requires you to install remotely, we actually don't really use PNP heavily on the backend of XI anymore.
Currently, until we find a better way, you will also need to update your performance data commands in the CCM in order to get perf data pushed from the workers up to XI and displayed properly.
You will need to change-
process-host-perfdata-file-bulk and process-service-perfdata-file-bulk's command's to:
And:Code: Select all
sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata && /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
And apply configuration.Code: Select all
sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata && /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
For fresh Gearman install
1) Run ModGearmanFullinstallVersion.sh
2) Change process-host-perfdata-file-bulk and process-service-perfdata-file-bulk's commands
Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
RHEL 6 & 7
rrdcached & ramdisk optimisation