Gearman Load Balancer Configuration

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Gearman Load Balancer Configuration

Post by lmiltchev »

Can you post the gearman server and worker configuration files for those servers?
We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Gearman Load Balancer Configuration

Post by rajasegar »

lmiltchev wrote:
Can you post the gearman server and worker configuration files for those servers?
We haven't seen the configs, yet.

Can you show us the errors that you are getting? Please, provide us with as many details as possible.
Archive.zip
Please note that configuration is back at default at the master server.
Removed the hostgroups statement from the NEB config file.

No gearman errors. The timeout errors that I did get is related firewall or routing issue at the gearman worker server.
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Gearman Load Balancer Configuration

Post by tgriep »

With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Gearman Load Balancer Configuration

Post by rajasegar »

tgriep wrote:With the hostgroup "LOAD_BALANCER_MSB" commented out in the NEB file, I am assuming that the checks are running on the mastersvr system.
Is that correct?

On the worker system, can you ping, and or run the checks from the command line to all of the hosts in the LOAD_BALANCER_MSB hostgroup?
Yes, it is currently running on master only. I commented it out because it was not working right.
Yes. I can run most of the checks just fine from the worker system.

The issue is almost all the checks get dumped into this worker. I only want those in LOAD_BALANCER_MSB hostgroup to be processed by this worker.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gearman Load Balancer Configuration

Post by abrist »

Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Gearman Load Balancer Configuration

Post by rajasegar »

abrist wrote:Box had given you an example on Page 1 of how to configure workers for hostgroups. Could post your gearman and worker configs?
That is the example I followed.

Please note that currently, the hostgroup config is disabled.
Archive.zip
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Gearman Load Balancer Configuration

Post by jdalrymple »

Can we get you to PM a profile.zip? That way we can verify the hostgroup config coming down from XI?

I'm assuming when you say you have the hostgroup disabled that's the reason for the # in front of it in the neb config? Obviously it's not going to work proper with that there, but it sounds like you know that. Also, you should know that by default mod_gearman may STILL be distributing checks to your other worker host even with that commented out. Is that happening?
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Gearman Load Balancer Configuration

Post by Box293 »

rajasegar wrote:I setup another worker server as the second worker.
Created a hostgroup for a bunch of servers and set the hostgroups.

Job Server - mod_gearman_neb.conf
hostgroups=LOAD_BALANCER_MSB
Worker 1 - hostgroups option is remarked.

Server 2
Worker 2 - mod_gearman_worker.conf
hostgroups=LOAD_BALANCER_MSB

The problem is almost all the checks are now going to server 2 instead of those related to LOAD_BALANCER_MSB only.
Both servers are in the same segment.
Since firewall rules and routing are not open for both servers most of the checks fails.

Does anyone have any idea what is happening?
In Dev it sort of works fine with the same settings. Occasionally host checks from unrelated servers are sent to worker2.

So I've done some testing and observed the following behaviour with mod gearman (MG) and groups / queues.

To summarise in two sentences:
When a worker is configured to target a queue, it will also action the default "host" and "service" queues as well (the catch all queues). Defining the hosts=no and services=no in the worker config stops this behaviour.


A better explanation is given at the end of the following example.


With a basic config, when nagios starts it hands off the host and service checks to MG.
MG creates two queues called "host" and "service".

Next you can dedicate some checks to be run by specific workers.
In the mod_gearman_neb.conf you do this by specifying the hostgroups= and servicegroups= options. These options directly relate to hostgroups and servicegroups in nagios.
For example, I create a hostgroup called test_hostgroup1 and I put my "centos01" host in it.
In mod_gearman_neb.conf I specify hostgroups=test_hostgroup1

With this updated config, when nagios starts, MG creates three queues called "host", "service" and "hostgroup_test_hostgroup1".

Without making any changes to the worker configs, all the checks right now will continue to be executed by all the workers EXCEPT for any centos01 HOST or SERVICE checks. All the checks for centos01 (HOST or SERVICE) will start to build up in the queue "hostgroup_test_hostgroup1". This can be observed in gearman_top

Next I modify one of my workers (WORKER1). In mod_gearman_worker.conf I specify hostgroups=test_hostgroup1.
I restart the worker service and now all of those checks in the queue "hostgroup_test_hostgroup1" are executed.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting. This is expected.

Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue AND it also actions the "host" and "service" queues.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Now I start the mod_gearman_worker service on a different worker.
I observe that this worker actions the "host" and "service" queues and it leaves the hostgroup_test_hostgroup1 queue alone.

Next, I stopped the mod_gearman_worker service on ALL of my workers.
Watching gearman_top I observe the three queues three queues called "host", "service" and "hostgroup_test_hostgroup1" all building up with jobs waiting.

Next I modify WORKER1. In mod_gearman_worker.conf I specify hosts=no and services=no.
Now I start the mod_gearman_worker service on WORKER1.
I observe that WORKER1 actions the hostgroup_test_hostgroup1 queue ONLY and it leaves the "host" and "service" queues alone.


So, the behaviour of a worker is to action any host or service groups defined in it's config AS WELL AS the "host" and "service" queues which are the "catch all".

What is happening here is that when Nagios starts and hands off the checks to MG, MG creates the queues. If there any host or service groups defined in the neb config, checks for those specific hosts and services are put in these specific queues. Any other checks are put in the "host" and "service" queues, hence the term "catch all".

You have two different methods to work around this behaviour.

Method #1
For the workers that you don't want the "catch all" "host" and "service" queues actioned, simply modify mod_gearman_worker.conf on that worker and specify hosts=no and services=no. It will ONLY action the queues it has defined in it's configs.

Method #2
Don't allow the "catch all" queues to be populated.
If you don't want any checks ending up in the "catch all" "host" and "service" queues, you need to:
  • Create a different nagios group for each worker which contains ALL the hosts for which that worker needs to action
    Define all the groups in mod_gearman_neb.conf using the hostgroups= and servicegroups= options
    Define each specific worker's config to target the queues it needs to action
Following this procedure, nothing will ever end up in the "catch all" "host" and "service" queues and hence checks will be correctly executed on the correct workers.


Let us know which method worked for you.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Gearman Load Balancer Configuration

Post by rajasegar »

Thanks Troy for the investigative work. It makes sense now.
Will test it out soon.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Gearman Load Balancer Configuration

Post by Box293 »

Anytime :)

Sometimes I need to go and play with things to truly learn how they work. Sometimes the official documentation doesn't make a lot of sense lol.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked