rajasegar wrote:I setup another worker server as the second worker.
Created a hostgroup for a bunch of servers and set the hostgroups.
Job Server - mod_gearman_neb.conf
hostgroups=LOAD_BALANCER_MSB
Worker 1 - hostgroups option is remarked.
Server 2
Worker 2 - mod_gearman_worker.conf
hostgroups=LOAD_BALANCER_MSB
The problem is almost all the checks are now going to server 2 instead of those related to LOAD_BALANCER_MSB only.
Both servers are in the same segment.
Since firewall rules and routing are not open for both servers most of the checks fails.
Does anyone have any idea what is happening?
In Dev it sort of works fine with the same settings. Occasionally host checks from unrelated servers are sent to worker2.
So I've done some testing and observed the following behaviour with mod gearman (MG) and groups / queues.
To summarise in two sentences:
When a worker is configured to target a queue, it will also action the default "host" and "service" queues as well (the catch all queues). Defining the hosts=no and services=no in the worker config stops this behaviour.
A better explanation is given at the end of the following example.
With a basic config, when nagios starts it hands off the host and service checks to MG.
MG creates two queues called "host" and "service".
Next you can dedicate some checks to be run by specific workers.
In the mod_gearman_neb.conf you do this by specifying the hostgroups= and servicegroups= options. These options directly relate to hostgroups and servicegroups in nagios.
For example, I create a hostgroup called test_hostgroup1 and I put my "centos01" host in it.
In mod_gearman_neb.conf I specify hostgroups=test_hostgroup1
With this updated config, when nagios starts, MG creates three queues called "host", "service" and "hostgroup_test_hostgroup1".
Without making any changes to the worker configs, all the checks right now will continue to be executed by all the workers EXCEPT for any centos01 HOST or SERVICE checks. All the checks for centos01 (HOST or SERVICE) will start to build up in the queue "hostgroup_test_hostgroup1". This can be observed in gearman_top
Next I modify one of my workers (WORKER1). In mod_gearman_worker.conf I specify hostgroups=test_hostgroup1.
I restart the worker service and now all of those checks in the queue "hostgroup_test_hostgroup1" are executed.
Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting. This is expected.
Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue AND it also actions the "host" and "service" queues.
Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.
Now I start the mod_gearman_worker service on a different worker.
I observe that this worker actions the "host" and "service" queues and it leaves the hostgroup_test_hostgroup1 queue alone.
Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.
Next I modify WORKER1. In mod_gearman_worker.conf I specify hosts=no and services=no.
Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue ONLY and it leaves the "host" and "service" queues alone.
So, the behaviour of a worker is to action any host or service groups defined in it's config AS WELL AS the "host" and "service" queues which are the "catch all".
What is happening here is that when Nagios starts and hands off the checks to MG, MG creates the queues. If there any host or service groups defined in the neb config, checks for those specific hosts and services are put in these specific queues. Any other checks are put in the "host" and "service" queues, hence the term "catch all".
You have two different methods to work around this behaviour.
Method #1
For the workers that you don't want the "catch all" "host" and "service" queues actioned, simply modify mod_gearman_worker.conf on that worker and specify hosts=no and services=no. It will ONLY action the queues it has defined in it's configs.
Method #2
Don't allow the "catch all" queues to be populated.
If you don't want any checks ending up in the "catch all" "host" and "service" queues, you need to:
- Create a different nagios group for each worker which contains ALL the hosts for which that worker needs to action
Define all the groups in mod_gearman_neb.conf using the hostgroups= and servicegroups= options
Define each specific worker's config to target the queues it needs to action
Following this procedure, nothing will ever end up in the "catch all" "host" and "service" queues and hence checks will be correctly executed on the correct workers.
Let us know which method worked for you.