Beware - technical data follows.
I have many workers - all reply properly to the 3 core servers that send them tests.
-------- service file setup
On the mod_gearman workers.
In the /etc/systemd/system location
I have 3 service files, one for each core server that sends tests. This avoids file name collisions.
(copied from /usr/lib/systemd/system and renamed and edited to fit my use)
This is one file.
cat mod-gearman-worker-core01.service
Code: Select all
[Unit]
Description=Mod-Gearman Worker for core01
Documentation=http://mod-gearman.org/docs.html
After=network.target
[Service]
EnvironmentFile=/etc/sysconfig/mod-gearman-worker-core01
Type=forking
PIDFile=/var/mod_gearman/mod_gearman_worker-core01.pid
ExecStart=/usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/worker-core01.conf --pidfile=/var/mod_gearman/mod_gearman_worker-core01.pid
ExecReload=/bin/kill -HUP $MAINPID
User=nagios
Group=nagios
StandardOutput=journal
StandardError=inherit
[Install]
WantedBy=multi-user.target
------- start stop and status shell scripts
This is on the worker system.
I use systemctl to start and stop each mod_gearman.service. Via a shell script. Or I just cat the file and use the line I need.
I have 3 scripts, one each for start, stop, status on each worker.
cat nagios-start.sh
Code: Select all
#!/bin/bash
# turn on echo mode
set -x
#
# Stop the services in the proper order
sudo systemctl start mod-gearman-worker-core01
sudo systemctl start mod-gearman-worker-core02
sudo systemctl start mod-gearman-worker-core03
# Finished
(I actually use:
sudo /usr/local/nagiosxi/scripts/manage_services.sh start mod-gearman-worker-core01
but systemctl is in the manage_services.sh script. - Explained at the bottom)
------------ mod_gearman conf files for each Nagios XI core server
I have 3 config files in /etc/mod_gearman
The files are identical except for the file names and core server IP. Each worker server has different setup but these 3 lines are the same.
worker-core01.conf
worker-core02.conf
worker-core03.conf
And each file has just those 3 lines different.
Code: Select all
logfile=/var/log/mod_gearman/mod_gearman_worker-core01.log
logfile=/var/log/mod_gearman/mod_gearman_worker-core02.log
logfile=/var/log/mod_gearman/mod_gearman_worker-core03.log
server=[Core System 1 IP]:4730
server=[Core System 2 IP]:4730
server=[Core System 3 IP]:4730
pidfile=/var/mod_gearman/mod_gearman_worker-core01.pid
pidfile=/var/mod_gearman/mod_gearman_worker-core02.pid
pidfile=/var/mod_gearman/mod_gearman_worker-core03.pid
This coexists perfectly on the workers and could probably support more core servers.
I run 10 workers presently and adding more soon.
In order to be able to force tests to specific workers, I set the worker configuration hostgroups and servicegroups like this
sample taken from /etc/mod_gearman/worker-core01.conf
Code: Select all
# sets a list of hostgroups which this worker will work on.
# Either specify a comma separated list or use multiple lines.
hostgroups=Gearman-Only-Hosts
hostgroups=Gearman-Only-on-[This worker - each worker has a name that goes here per worker]
# sets a list of servicegroups which this worker will work on.
servicegroups=Gearman-Only-Services
servicegroups=Gearman-Service-on-[This worker - each worker has a name that goes here per worker]
The hostgroups and servicegroups match.
If I need a host and all it's services run from a specific worker, I add that Hostgroup to the Host definition.
If I need just one service to run from a specific worker - I edit the service and assign the Servicegroup
In general, I have 8 workers that are equal and can run all the tests. I let the core server send tests to all of them randomly.
I have some remote areas where I have a dedicated worker. Those tests can only run from that one worker.
Those hosts in that remote area have host definitions tied to the Hostgroup serviced by that specific mod_gearman worker.
---------------- Core server /etc/mod_gearman/module.conf file
The gearman module.conf file has all the hostgroups and all the servicegroups listed. like this... just a sample
Notice that I have Hostgroups matching Servicegroups so I can freely and easily assign tests where I need them
Code: Select all
# sets a list of hostgroups which will go into separate queues
hostgroups=Gearman-Only-Hosts
# Workers
hostgroups=Gearman-Only-on-[worker 1]
hostgroups=Gearman-Only-on-[worker 2]
hostgroups=Gearman-Only-on-[worker 3]
hostgroups=Gearman-Only-on-[worker 4] etc
# Core Server
hostgroups=Gearman-Only-on-core01
#
# Custom Groups
hostgroups=Gearman-Only-Hosts-custom
# Custom Worker
hostgroups=Gearman-Only-on-[worker 9]
#
# other Sites
hostgroups=Gearman-Only-Hosts-[worker 10]
# another Worker
hostgroups=Gearman-Only-on-VMG009001
# sets a list of servicegroups which will go into separate queues.
servicegroups=Gearman-Only-Services
# Workers
servicegroups=Gearman-Service-on-[worker 1]
servicegroups=Gearman-Service-on-[worker 2]
servicegroups=Gearman-Service-on-[worker 3]
servicegroups=Gearman-Service-on-[worker 4] etc
# Core Server
servicegroups=Gearman-Service-on-core01
#
# Custom Groups
servicegroups=Gearman-Only-Services-custom
# Custom Worker
servicegroups=Gearman-Service-on-[Worker 9]
#
# other Sites
servicegroups=Gearman-Only-Services-[worker 10]
# another Worker
servicegroups=Gearman-Service-on-VMG009001
----------- Additional info - Management console
Congratulations for getting this far. It is a lot of data.
This data should get anyone going with multiple core servers (core server = one single Nagios XI host)
Of course I use the management component too. Each server can control its own config files remotely.
one example of 10 items...
Installation of this and getting shared keys in place is a whole other topic.
/usr/local/nagiosxi/html/includes/components/modgearmanxi/modgearmanxi.config.inc.php
Code: Select all
'[worker 1]' => array(
'ip' =>'[actual IP]',
'user' =>'nagios',
'cfg' =>'/etc/mod_gearman/worker-core01.conf',
'initd' =>'sudo systemctl status mod-gearman-worker-core01'
),
I actually use manage_services.sh but changed it to systemctl to make it easier for folks to understand.
this is my actual syntax
Code: Select all
'initd' =>'sudo /usr/local/nagiosxi/scripts/manage_services.sh status mod-gearman-worker-core01'
---------- manage_services.sh changes to make management in a locked down environment work
And I have modified manage_services.sh to be able to run the commands.
/usr/local/nagiosxi/scripts/manage_services.sh
changed 2 lines...
first=("start" "stop" "restart" "status" "reload" "checkconfig" "enable" "disable"
and
second=("httpd" "mysqld" "nagios" "ndo2db" "npcd" "snmptt" "crond" "snmptrapd" "gearmand" "rrdcached" "mod-gearman-worker" "firewalld")
----------------- apologies
Sorry if this is a lot to digest, but you should be able to run many core servers against many mod_gearman workers.
core server = Nagios XI server
Feel free to PM me if you need more data. I've been real busy over the last year but will try to be helpful.
Steve B