Page 2 of 3

Re: Gearman Load Balancer Configuration

Posted: Thu Mar 26, 2015 1:28 pm
by lmiltchev
Can you post the gearman server and worker configuration files for those servers?
We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.

Re: Gearman Load Balancer Configuration

Posted: Thu Mar 26, 2015 6:22 pm
by rajasegar
lmiltchev wrote:
Can you post the gearman server and worker configuration files for those servers?
We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.
Archive.zip
Please note that configuration is back at default at the master server.
Removed the hostgroups statement from the NEB config file.

No gearman errors. The timeout errors that I did get is related firewall or routing issue at the gearman worker server.

Re: Gearman Load Balancer Configuration

Posted: Fri Mar 27, 2015 9:54 am
by tgriep
With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?

Re: Gearman Load Balancer Configuration

Posted: Sun Mar 29, 2015 7:18 pm
by rajasegar
tgriep wrote:With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?
Yes, it is currently running on master only. I commented it out because it was not working right.
Yes. I can run most of the checks just fine from the worker system.

The issue is almost all the checks get dumped into this worker. I only want those in LOAD_BALANCER_MSB hostgroup to be processed by this worker.

Re: Gearman Load Balancer Configuration

Posted: Mon Mar 30, 2015 11:07 am
by abrist
Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?

Re: Gearman Load Balancer Configuration

Posted: Mon Mar 30, 2015 9:14 pm
by rajasegar
abrist wrote:Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?
That is the example I followed.

Please note that currently, the hostgroup config is disabled.
Archive.zip

Re: Gearman Load Balancer Configuration

Posted: Tue Mar 31, 2015 5:03 pm
by jdalrymple
Can we get you to PM a profile.zip? That way we can verify the hostgroup config coming down from XI?

I'm assuming when you say you have the hostgroup disabled that's the reason for the # in front of it in the neb config? Obviously it's not going to work proper with that there, but it sounds like you know that. Also, you should know that by default mod_gearman may STILL be distributing checks to your other worker host even with that commented out. Is that happening?

Re: Gearman Load Balancer Configuration

Posted: Tue Mar 31, 2015 6:11 pm
by Box293
rajasegar wrote:I setup another worker server as the second worker.
Created a hostgroup for a bunch of servers and set the hostgroups.

Job Server - mod_gearman_neb.conf
hostgroups=LOAD_BALANCER_MSB
Worker 1 - hostgroups option is remarked.

Server 2
Worker 2 - mod_gearman_worker.conf
hostgroups=LOAD_BALANCER_MSB

The problem is almost all the checks are now going to server 2 instead of those related to LOAD_BALANCER_MSB only.
Both servers are in the same segment.
Since firewall rules and routing are not open for both servers most of the checks fails.

Does anyone have any idea what is happening?
In Dev it sort of works fine with the same settings. Occasionally host checks from unrelated servers are sent to worker2.

So I've done some testing and observed the following behaviour with mod gearman (MG) and groups / queues.

To summarise in two sentences:
When a worker is configured to target a queue, it will also action the default "host" and "service" queues as well (the catch all queues). Defining the hosts=no and services=no in the worker config stops this behaviour.


A better explanation is given at the end of the following example.


With a basic config, when nagios starts it hands off the host and service checks to MG.
MG creates two queues called "host" and "service".

Next you can dedicate some checks to be run by specific workers.
In the mod_gearman_neb.conf you do this by specifying the hostgroups= and servicegroups= options. These options directly relate to hostgroups and servicegroups in nagios.
For example, I create a hostgroup called test_hostgroup1 and I put my "centos01" host in it.
In mod_gearman_neb.conf I specify hostgroups=test_hostgroup1

With this updated config, when nagios starts, MG creates three queues called "host", "service" and "hostgroup_test_hostgroup1".

Without making any changes to the worker configs, all the checks right now will continue to be executed by all the workers EXCEPT for any centos01 HOST or SERVICE checks. All the checks for centos01 (HOST or SERVICE) will start to build up in the queue "hostgroup_test_hostgroup1". This can be observed in gearman_top

Next I modify one of my workers (WORKER1). In mod_gearman_worker.conf I specify hostgroups=test_hostgroup1.
I restart the worker service and now all of those checks in the queue "hostgroup_test_hostgroup1" are executed.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting. This is expected.

Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue AND it also actions the "host" and "service" queues.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Now I start the mod_gearman_worker service on a different worker.
I observe that this worker actions the "host" and "service" queues and it leaves the hostgroup_test_hostgroup1 queue alone.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Next I modify WORKER1. In mod_gearman_worker.conf I specify hosts=no and services=no.
Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue ONLY and it leaves the "host" and "service" queues alone.


So, the behaviour of a worker is to action any host or service groups defined in it's config AS WELL AS the "host" and "service" queues which are the "catch all".

What is happening here is that when Nagios starts and hands off the checks to MG, MG creates the queues. If there any host or service groups defined in the neb config, checks for those specific hosts and services are put in these specific queues. Any other checks are put in the "host" and "service" queues, hence the term "catch all".

You have two different methods to work around this behaviour.

Method #1
For the workers that you don't want the "catch all" "host" and "service" queues actioned, simply modify mod_gearman_worker.conf on that worker and specify hosts=no and services=no. It will ONLY action the queues it has defined in it's configs.

Method #2
Don't allow the "catch all" queues to be populated.
If you don't want any checks ending up in the "catch all" "host" and "service" queues, you need to:
  • Create a different nagios group for each worker which contains ALL the hosts for which that worker needs to action
    Define all the groups in mod_gearman_neb.conf using the hostgroups= and servicegroups= options
    Define each specific worker's config to target the queues it needs to action
Following this procedure, nothing will ever end up in the "catch all" "host" and "service" queues and hence checks will be correctly executed on the correct workers.


Let us know which method worked for you.

Re: Gearman Load Balancer Configuration

Posted: Tue Mar 31, 2015 6:24 pm
by rajasegar
Thanks Troy for the investigative work. It makes sense now.
Will test it out soon.

Re: Gearman Load Balancer Configuration

Posted: Tue Mar 31, 2015 6:36 pm
by Box293
Anytime :)

Sometimes I need to go and play with things to truly learn how they work. Sometimes the official documentation doesn't make a lot of sense lol.