Gearman Load Balancer Configuration

Post by **lmiltchev** » Thu Mar 26, 2015 1:28 pm

Can you post the gearman server and worker configuration files for those servers?

We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.

rajasegar · Post by **rajasegar** » Thu Mar 26, 2015 6:22 pm

lmiltchev wrote:
Can you post the gearman server and worker configuration files for those servers?
We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.

Archive.zip

Please note that configuration is back at default at the master server.
Removed the hostgroups statement from the NEB config file.

No gearman errors. The timeout errors that I did get is related firewall or routing issue at the gearman worker server.

Post by **tgriep** » Fri Mar 27, 2015 9:54 am

With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?

rajasegar · Post by **rajasegar** » Sun Mar 29, 2015 7:18 pm

tgriep wrote:With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?

Yes, it is currently running on master only. I commented it out because it was not working right.
Yes. I can run most of the checks just fine from the worker system.

The issue is almost all the checks get dumped into this worker. I only want those in LOAD_BALANCER_MSB hostgroup to be processed by this worker.

abrist · Post by **abrist** » Mon Mar 30, 2015 11:07 am

Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?

rajasegar · Post by **rajasegar** » Mon Mar 30, 2015 9:14 pm

abrist wrote:Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?

That is the example I followed.

Please note that currently, the hostgroup config is disabled.

Archive.zip

jdalrymple · Post by **jdalrymple** » Tue Mar 31, 2015 5:03 pm

Can we get you to PM a profile.zip? That way we can verify the hostgroup config coming down from XI?

I'm assuming when you say you have the hostgroup disabled that's the reason for the # in front of it in the neb config? Obviously it's not going to work proper with that there, but it sounds like you know that. Also, you should know that by default mod_gearman may STILL be distributing checks to your other worker host even with that commented out. Is that happening?

Post by **Box293** » Tue Mar 31, 2015 6:11 pm

rajasegar wrote:I setup another worker server as the second worker.
Created a hostgroup for a bunch of servers and set the hostgroups.

Job Server - mod_gearman_neb.conf
hostgroups=LOAD_BALANCER_MSB
Worker 1 - hostgroups option is remarked.

Server 2
Worker 2 - mod_gearman_worker.conf
hostgroups=LOAD_BALANCER_MSB

The problem is almost all the checks are now going to server 2 instead of those related to LOAD_BALANCER_MSB only.
Both servers are in the same segment.
Since firewall rules and routing are not open for both servers most of the checks fails.

Does anyone have any idea what is happening?
In Dev it sort of works fine with the same settings. Occasionally host checks from unrelated servers are sent to worker2.

So I've done some testing and observed the following behaviour with mod gearman (MG) and groups / queues.

To summarise in two sentences:
When a worker is configured to target a queue, it will also action the default "host" and "service" queues as well (the catch all queues). Defining the hosts=no and services=no in the worker config stops this behaviour.

A better explanation is given at the end of the following example.

With a basic config, when nagios starts it hands off the host and service checks to MG.
MG creates two queues called "host" and "service".

Next you can dedicate some checks to be run by specific workers.
In the mod_gearman_neb.conf you do this by specifying the hostgroups= and servicegroups= options. These options directly relate to hostgroups and servicegroups in nagios.
For example, I create a hostgroup called test_hostgroup1 and I put my "centos01" host in it.
In mod_gearman_neb.conf I specify hostgroups=test_hostgroup1

With this updated config, when nagios starts, MG creates three queues called "host", "service" and "hostgroup_test_hostgroup1".

Without making any changes to the worker configs, all the checks right now will continue to be executed by all the workers EXCEPT for any centos01 HOST or SERVICE checks. All the checks for centos01 (HOST or SERVICE) will start to build up in the queue "hostgroup_test_hostgroup1". This can be observed in gearman_top

Next I modify one of my workers (WORKER1). In mod_gearman_worker.conf I specify hostgroups=test_hostgroup1.
I restart the worker service and now all of those checks in the queue "hostgroup_test_hostgroup1" are executed.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting. This is expected.

Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue AND it also actions the "host" and "service" queues.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Now I start the mod_gearman_worker service on a different worker.
I observe that this worker actions the "host" and "service" queues and it leaves the hostgroup_test_hostgroup1 queue alone.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Next I modify WORKER1. In mod_gearman_worker.conf I specify hosts=no and services=no.
Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue ONLY and it leaves the "host" and "service" queues alone.

So, the behaviour of a worker is to action any host or service groups defined in it's config AS WELL AS the "host" and "service" queues which are the "catch all".

What is happening here is that when Nagios starts and hands off the checks to MG, MG creates the queues. If there any host or service groups defined in the neb config, checks for those specific hosts and services are put in these specific queues. Any other checks are put in the "host" and "service" queues, hence the term "catch all".

You have two different methods to work around this behaviour.

Method #1
For the workers that you don't want the "catch all" "host" and "service" queues actioned, simply modify mod_gearman_worker.conf on that worker and specify hosts=no and services=no. It will ONLY action the queues it has defined in it's configs.

Method #2
Don't allow the "catch all" queues to be populated.
If you don't want any checks ending up in the "catch all" "host" and "service" queues, you need to:

Create a different nagios group for each worker which contains ALL the hosts for which that worker needs to action
Define all the groups in mod_gearman_neb.conf using the hostgroups= and servicegroups= options
Define each specific worker's config to target the queues it needs to action

Following this procedure, nothing will ever end up in the "catch all" "host" and "service" queues and hence checks will be correctly executed on the correct workers.

Let us know which method worked for you.

rajasegar · Post by **rajasegar** » Tue Mar 31, 2015 6:24 pm

Thanks Troy for the investigative work. It makes sense now.
Will test it out soon.

Post by **Box293** » Tue Mar 31, 2015 6:36 pm

Anytime

Sometimes I need to go and play with things to truly learn how they work. Sometimes the official documentation doesn't make a lot of sense lol.

Nagios Support Forum

Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration

Re: Gearman Load Balancer Configuration