Mod Gearman Issue post 2014R1.4 Upgrade

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Mod Gearman Issue post 2014R1.4 Upgrade

Post by rajasegar »

Nagios 2014R1.2 -> 2014R1.4

Mod Gearman services was working fine with 2014R1.2.
Did test upgrade in Dev and it was fine.

However when upgraded production. The scheduling went haywire and could never keep up.

After upgrade to R1.4, waited for 30 minutes still same
19-08-2014 07-35-33 AM.png
Restore back to R1.2
19-08-2014 07-55-46 AM.png
No error messages anywhere in logs or in upgrade log.

Please advice on this issue.
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by slansing »

We had this reported from one user who upgraded from 2014 r1.2 to r1.3, the resolution was the following:

After the upgrade -

Code: Select all

service nagios stop

mv /usr/local/nagios/var/retention.dat /usr/local/nagios/var/retention.dat.bak

service nagios start
Your scheduling should be clean now.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by rajasegar »

slansing wrote:We had this reported from one user who upgraded from 2014 r1.2 to r1.3, the resolution was the following:

After the upgrade -

Code: Select all

service nagios stop

mv /usr/local/nagios/var/retention.dat /usr/local/nagios/var/retention.dat.bak

service nagios start
Your scheduling should be clean now.
Please update the side effects of doing this.
The last time you guys asked to do this, nagios send notifications again for all the services that had been set for single notification.
This caused a lot of issues at our end.

There must be a better way.

Since this issue was known why was it not communicated? This would have saved us a lot of unnecessary work.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by tmcdonald »

You can simply disable notifications, do slansing's suggestion, then re-enable notifications once checking resumes. You shouldn't get emails "queued up" and sent after you re-enable.
Former Nagios employee
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by rajasegar »

tmcdonald wrote:You can simply disable notifications, do slansing's suggestion, then re-enable notifications once checking resumes. You shouldn't get emails "queued up" and sent after you re-enable.
Sounds good in theory but this cannot be applied in an Enterprise Environment.

We have over 1200 devices and 10,000 services being monitored.
What if some important alert is not sent out during this time?
Most of them is set to notify only once until the status changes.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by abrist »

I understand your concerns and my apologies for the work to disable notifications.. This is why it is best to upgrade/perform maintenance in a maintenance window.
Are you referring to losing your acknowledgements, or some other setting?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by rajasegar »

abrist wrote:I understand your concerns and my apologies for the work to disable notifications.. This is why it is best to upgrade/perform maintenance in a maintenance window.
Are you referring to losing your acknowledgements, or some other setting?
There is no maintenance window for monitoring systems. It is requested when required.
I am worried about new notifications not being sent out.
Disabling notification only helps for those alerts already sent out and we want to avoid resending it.

I suggest Nagios to setup a known issues tracker so that those attempting upgrade are aware of issues.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by slansing »

Well, known issues insinuates that the problems are occurring on all systems effected by a certain version. Unfortunately, we can not pull a scare tactic and post all possible issues that may effect someone without verifying them against our systems internally, or other customer/user systems, as the majority of them are one shots on independent systems. There is a bug/feature request tracker at http://tracker.nagios.com/ .

It would be nice to upgrade in place, unfortunately, that is really impossible unless you were to upgrade a complete carbon copy of your production server, and fail your production server over. Though even that would incite some lag and the possibility of dropping monitoring / alerting for a short period of time. There is an expected period of monitoring downtime when you must restart services, and change/move files on the server itself. If your hosts/services are still down when you re-enable notifications, or went down during the upgrade, Nagios should send a notification along it's interval that you have set for those systems. You could also make use of the Ops Screen, or Ops Center pages to keep an eye on the current issues in your monitoring environment, that should persist visibly through an update, though it may lock up for a moment when apache/nagios are restarted.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Mod Gearman Issue post 2014R1.4 Upgrade

Post by rajasegar »

slansing wrote:Well, known issues insinuates that the problems are occurring on all systems effected by a certain version. Unfortunately, we can not pull a scare tactic and post all possible issues that may effect someone without verifying them against our systems internally, or other customer/user systems, as the majority of them are one shots on independent systems. There is a bug/feature request tracker at .http://tracker.nagios.com/
Sorry I dont agree with you. I just dont have the time to filter through the bug tracker.

Please close this thread.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked