Hello,
I noticed that in case an MG worker goes down, all the checks are performed by Nagios server automatically until the worker returns back up. Now, in a 80000 services infrastructure this can cause an issue if 2 or more workers go down at the same time (maybe a connection issue between the servers).
Anyone has a workaround for this or any idea? Somehow disable the services from ever running from the nagios server itself?
Thanks
MOD-Gearman question
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: MOD-Gearman question
Hi @aceadm,
That's a good question. Adding more workers would be a simple solution but there is the option to offload specific host groups to a remote worker, but in that setup, I'm not 100% how the checks would be handled if that worker were to fail.
Going to consult with the Mod Gearman expert here and follow up with you shortly on this question.
Best Regards,
Benjamin
Reference
Nagios XI - Mod-Gearman Queues and Workers
That's a good question. Adding more workers would be a simple solution but there is the option to offload specific host groups to a remote worker, but in that setup, I'm not 100% how the checks would be handled if that worker were to fail.
Going to consult with the Mod Gearman expert here and follow up with you shortly on this question.
Best Regards,
Benjamin
Reference
Nagios XI - Mod-Gearman Queues and Workers
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: MOD-Gearman question
Hello,
Any update on this?
Any update on this?
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: MOD-Gearman question
Hi
My apologies for the delay, here is what I've found out.
* When a worker goes down, the checks should remain in the gearman queue and after 10-15 minutes, it would notify the nagios process that the checks are orphaned.
* It should not run the check locally unless it has been set up to do so
* If you do not want the check to be run locally, then do not install a worker running on the XI server
Hope that helps clear things up, let me know if you need clarification on anything.
--Benjamin
My apologies for the delay, here is what I've found out.
* When a worker goes down, the checks should remain in the gearman queue and after 10-15 minutes, it would notify the nagios process that the checks are orphaned.
* It should not run the check locally unless it has been set up to do so
* If you do not want the check to be run locally, then do not install a worker running on the XI server
Hope that helps clear things up, let me know if you need clarification on anything.
--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!