Page 1 of 1

MOD-Gearman question

Posted: Fri Jan 15, 2021 6:02 pm
by aceadm
Hello,

I noticed that in case an MG worker goes down, all the checks are performed by Nagios server automatically until the worker returns back up. Now, in a 80000 services infrastructure this can cause an issue if 2 or more workers go down at the same time (maybe a connection issue between the servers).
Anyone has a workaround for this or any idea? Somehow disable the services from ever running from the nagios server itself?

Thanks

Re: MOD-Gearman question

Posted: Mon Jan 18, 2021 6:29 pm
by benjaminsmith
Hi @aceadm,

That's a good question. Adding more workers would be a simple solution but there is the option to offload specific host groups to a remote worker, but in that setup, I'm not 100% how the checks would be handled if that worker were to fail.

Going to consult with the Mod Gearman expert here and follow up with you shortly on this question.

Best Regards,
Benjamin

Reference
Nagios XI - Mod-Gearman Queues and Workers

Re: MOD-Gearman question

Posted: Wed Jan 20, 2021 10:14 am
by aceadm
Hello,

Any update on this?

Re: MOD-Gearman question

Posted: Thu Jan 21, 2021 11:08 am
by benjaminsmith
Hi

My apologies for the delay, here is what I've found out.

* When a worker goes down, the checks should remain in the gearman queue and after 10-15 minutes, it would notify the nagios process that the checks are orphaned.

* It should not run the check locally unless it has been set up to do so

* If you do not want the check to be run locally, then do not install a worker running on the XI server

Hope that helps clear things up, let me know if you need clarification on anything.

--Benjamin