What should i do to make my workers do the checks?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

What should i do to make my workers do the checks?

Post by litsupport.box »

Hello again,

First let me explain what my situation looks like now:

gearman_top output:
Image

Neb.conf on master/nagios xi server SR0336:
server=127.0.0.1:4730

Code: Select all

eventhandler=yes
services=yes
hosts=yes
hostgroups=worker_group
servicegroups=sgroup1,sgroup2,sgroup3
do_hostchecks=yes
result_workers=1
worker.conf on SR0363:

Code: Select all

server=xxx.xxx.xxx.xxx:4730 (Master server ip)
eventhandler=yes
services=yes
hosts=yes
hostgroups=worker_group
servicegroups=sgroup1
On NagiosXI:

Hosts: Which are test servers, connected to them with an account which can read WMI.
Image

Host Groups:
Image

Services:
Image

Service Groups:
Image

My goal:
Nagios XI server that monitors all servers in our organisation, multiple workers that perform checks on the servers that are located on the same location as where the worker is installed. Then report back to the master server with their results.

Questions:
1: How does it exactly work, should i add all the servers onto the nagios xi and is it then possible to assign them to a worker that does the checks on them?
2: Do the workers also have to be added in Nagios XI under hosts like SR0050?
3: What actions should i undertake to get closer to complete my task?
4: If you look at my screenshots, what seems off?
5: I don't see any activity when i do a "scheduled check of all services on the server" in gearman_top nothing really changes in the overview what. I guess im just doing something wrong.

Thank you in advance.

Farid
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: What should i do to make my workers do the checks?

Post by lmiltchev »

You need to have at least one one gearman job server running. You can have local workers or remote workers on one or more worker clients, executing the checks.

Here's a typical workflow:
Nagios wants to execute a service check.

1. The check is intercepted by the Mod-Gearman neb module.

2. Mod-Gearman puts the job into the service queue.

3. A worker grabs the job and puts back the result into the check_results queue

4. Mod-Gearman grabs the result job and puts back the result onto the check result list

5. The Nagios reaper reads all checks from the result list and updates hosts and services
I would start troubleshooting by verifying if the mod gearman services are running:

on the server:

Code: Select all

service gearamand status
service mod_gearman_worker status
on the client

Code: Select all

service mod_gearman_worker status
Try restarting the gearmand and nagios on the server and all workers to start with:

Code: Select all

service nagios restart
service gearamand restart
service mod_gearman_worker restart
Then run the gearman_top again and watch it for a while.

Check for errors in the log:

Code: Select all

tail -100 /var/log/mod_gearman/mod_gearman_neb.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: What should i do to make my workers do the checks?

Post by Box293 »

As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: What should i do to make my workers do the checks?

Post by litsupport.box »

Hello guys,

Box293: My configuration is based on your guides. @lmiltchev: There is a Job Server running (that's my server336). On the 336 i have Nagios XI and i disabled the worker on that one "service mod_gearman_worker stop" because i don't want my Master/Job Server to run checks on the hosts. I only want the workers to send the check results to my job server.

This is what i accomplished right now:
For testing purposes i've installed two workers and configured them. On both workers i told them to be in worker_group1 or 2 and service_group1 or 2. In Nagios XI i've also made these host/service groups and added 1 test server to both groups as followed: (the idea is that worker1 only checks hosts that are in "for example: England" and worker 2 "for example: Germany") then send back results to the Job Server in France.

Host Groups:
Image

Worker Group 1:
Image

Worker Group2:
Image

Service Groups:
Image

Service Group 1:
Image

Service Group 2:
Image

Also when i run a "scheduled check" i don't get any "Jobs Waiting", i wanted to test if it does what i want so i turned off the workers to get them in queue. What could this be? I also don't see check_results anymore?:
With workers off:
Image
With workers on:
Image

How i run checks: (this way i got jobs in queue when i tried it the first time using http://sites.box293.com/nagios/guides/m ... -nagios-xi Guide):
Image
Image

Sorry for posting this many screenshots.. Hope you can help me again. Thanks in advance guys.
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: What should i do to make my workers do the checks?

Post by tgriep »

Can you run this command and post the results back?

Code: Select all

grep broker_module /usr/local/nagios/etc/nagios.cfg
Could you post the Gearman server and worker config files so we can review them?
Be sure to check out our Knowledgebase for helpful articles and solutions!
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: What should i do to make my workers do the checks?

Post by litsupport.box »

Job Server/Master/NagiosXI:

Code: Select all

[root@sr0336 ~]# grep broker_module /usr/local/nagios/etc/nagios.cfg         
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
worker 1: My colleague installed this one before i came in, i noticed nagios is installed on it: (Does this matter?)

Code: Select all

[root@sr0338 ~]# grep broker_module /usr/local/nagios/etc/nagios.cfg
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
broker_module=/usr/local/nagios/lib/dnxPlugin.so /usr/local/nagios/etc/dnxServer.cfg
worker 2: the one i've been working with:

Code: Select all

[root@sr0363 ~]# grep broker_module /usr/local/nagios/etc/nagios.cfg
grep: /usr/local/nagios/etc/nagios.cfg: No such file or directory
Job Server NEB.conf

Code: Select all

##############################################################################
#
#  Mod-Gearman - distribute checks with gearman
#
#  Copyright (c) 2010 Sven Nierlein
#
#  Mod-Gearman NEB Module Config
#
###############################################################################

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=0

# Path to the logfile.
logfile=/var/log/mod_gearman/mod_gearman_neb.log

# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
server=127.0.0.1:4730


# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>


# defines if the module should distribute execution of
# eventhandlers.
eventhandler=yes


# defines if the module should distribute execution of
# service checks.
services=yes


# defines if the module should distribute execution of
# host checks.
hosts=yes


# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=worker_group1,worker_group2
#hostgroups=name2,name3


# sets a list of servicegroups which will go into seperate
# queues.
servicegroups=service_group1,service_group2

# Set this to 'no' if you want Mod-Gearman to only take care of
# servicechecks. No hostchecks will be processed by Mod-Gearman. Use
# this option to disable hostchecks and still have the possibility to
# use hostgroups for easy configuration of your services.
# If set to yes, you still have to define which hostchecks should be
# processed by either using 'hosts' or the 'hostgroups' option.
# Default is Yes.
do_hostchecks=yes

# This settings determines if all eventhandlers go into a single
# 'eventhandlers' queue or into the same queue like normal checks
# would do.
route_eventhandler_like_checks=no

# enables or disables encryption. It is strongly
# advised to not disable encryption. Anybody will be
# able to inject packages to your worker.
# Encryption is enabled by default and you have to
# explicitly disable it.
# When using encryption, you will either have to
# specify a shared password with key=... or a
# keyfile with keyfile=...
# Default is On.
encryption=yes


# A shared password which will be used for
# encryption of data pakets. Should be at least 8
# bytes long. Maximum length is 32 characters.
key=Test!23456!


# The shared password will be read from this file.
# Use either key or keyfile. Only the first 32
# characters will be used.
#keyfile=/path/to/secret.file


# use_uniq_jobs
# Using uniq keys prevents the gearman queues from filling up when there
# is no worker. However, gearmand seems to have problems with the uniq
# key and sometimes jobs get stuck in the queue. Set this option to 'off'
# when you run into problems with stuck jobs but make sure your worker
# are running.
use_uniq_jobs=on



###############################################################################
#
# NEB Module Config
#
# the following settings are for the neb module only and
# will be ignored by the worker.
#
###############################################################################

# sets a list of hostgroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localhostgroups=hosts_ignored_by_mod_gearman


# sets a list of servicegroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localservicegroups=services_ignored_by_mod_gearman

# The queue_custom_variable can be used to define the target queue
# by a custom variable in addition to host/servicegroups. When set
# for ex. to 'WORKER' you then could define a '_WORKER' custom
# variable for your hosts and services to directly set the worker
# queue. The host queue is inherited unless overwritten
# by a service custom variable. Set the value of your custom
# variable to 'local' to bypass Mod-Gearman (Same behaviour as in
# localhostgroups/localservicegroups).
#queue_custom_variable=WORKER

# Number of result worker threads. Usually one is
# enough. You may increase the value if your
# result queue is not processed fast enough.
# Default: 1
result_workers=1


# defines if the module should distribute perfdata
# to gearman.
# Note: processing of perfdata is not part of
# mod_gearman. You will need additional worker for
# handling performance data. For example: pnp4nagios
# Performance data is just written to the gearman
# queue.
# Default: no
perfdata=no

# perfdata mode overwrite helps preventing the perdata queue getting to big
# 1 = overwrote
# 2 = append
perfdata_mode=1

# The Mod-Gearman NEB module will submit a fake result for orphaned host
# checks with a message saying there is no worker running for this
# queue. Use this option to get better reporting results, otherwise your
# hosts will keep their last state as long as there is no worker
# running.
# Default: yes
orphan_host_checks=yes

# Same like 'orphan_host_checks' but for services.
# Default: yes
orphan_service_checks=yes

# When accept_clear_results is enabled, the NEB module will accept unencrypted
# results too. This is quite useful if you have lots of passive checks and make
# use of send_gearman/send_multi where you would have to spread the shared key to
# all clients using these tools.
# Default is no.
accept_clear_results=no

Worker 2: worker.conf:

Code: Select all

###############################################################################
#
#  Mod-Gearman - distribute checks with gearman
#
#  Copyright (c) 2010 Sven Nierlein
#
#  Worker Module Config
#
###############################################################################

# Identifier, hostname will be used if undefined
#identifier=hostname

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=0

# Path to the logfile.
logfile=/var/log/mod_gearman/mod_gearman_worker.log

# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
server=SR336:4730


# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>


# defines if the worker should execute eventhandlers.
eventhandler=yes


# defines if the worker should execute
# service checks.
services=yes


# defines if the worker should execute
# host checks.
hosts=yes


# sets a list of hostgroups which this worker will work
# on. Either specify a comma seperated list or use
# multiple lines.
hostgroups=worker_group1
#hostgroups=name2,name3


# sets a list of servicegroups which this worker will
# work on.
servicegroups=service_group1

# enables or disables encryption. It is strongly
# advised to not disable encryption. Anybody will be
# able to inject packages to your worker.
# Encryption is enabled by default and you have to
# explicitly disable it.
# When using encryption, you will either have to
# specify a shared password with key=... or a
# keyfile with keyfile=...
# Default is On.
encryption=yes


# A shared password which will be used for
# encryption of data pakets. Should be at least 8
# bytes long. Maximum length is 32 characters.
key=should_be_changed


# The shared password will be read from this file.
# Use either key or keyfile. Only the first 32
# characters will be used.
#keyfile=/path/to/secret.file

# Path to the pidfile. Usually set by the init script
#pidfile=/var/mod_gearman/mod_gearman_worker.pid

# Default job timeout in seconds. Currently this value is only used for
# eventhandler. The worker will use the values from the core for host and
# service checks.
job_timeout=60

# Minimum number of worker processes which should
# run at any time.
min-worker=5

# Maximum number of worker processes which should
# run at any time. You may set this equal to
# min-worker setting to disable dynamic starting of
# workers. When setting this to 1, all services from
# this worker will be executed one after another.
max-worker=50

# Time after which an idling worker exists
# This parameter controls how fast your waiting workers will
# exit if there are no jobs waiting.
idle-timeout=30

# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000

# max-age is the threshold for discarding too old jobs. When a new job is older
# than this amount of seconds it will not be executed and just discarded. Set to
# zero to disable this check.
#max-age=0

# defines the rate of spawned worker per second as long
# as there are jobs waiting
spawn-rate=1

# Use this option to disable an extra fork for each plugin execution. Disabling
# this option will reduce the load on the worker host but can lead to problems with
# unclean plugin. Default: yes
fork_on_exec=no

# Set a limit based on the 1min load average. When exceding the load limit,
# no new worker will be started until the current load is below the limit.
# No limit will be used when set to 0.
load_limit1=0

# Same as load_limit1 but for the 5min load average.
load_limit5=0

# Same as load_limit1 but for the 15min load average.
load_limit15=0

# Use this option to show stderr output of plugins too.
# Default: yes
show_error_output=yes

# Use dup_results_are_passive to set if the duplicate result send to the dupserver
# will be passive or active.
# Default is yes (passive).
#dup_results_are_passive=yes

# When embedded perl has been compiled in, you can use this
# switch to enable or disable the embedded perl interpreter.
enable_embedded_perl=on

# Default value used when the perl script does not have a
# "nagios: +epn" or "nagios: -epn" set.
# Perl scripts not written for epn support usually fail with epn,
# so its better to set the default to off.
use_embedded_perl_implicitly=off

# Cache compiled perl scripts. This makes the worker process a little
# bit bigger but makes execution of perl scripts even faster.
# When turned off, Mod-Gearman will still use the embedded perl
# interpreter, but will not cache the compiled script.
use_perl_cache=on

# path to p1 file which is used to execute and cache the
# perl scripts run by the embedded perl interpreter
p1_file=/usr/share/mod_gearman/mod_gearman_p1.pl


# Workarounds

# workaround for rc 25 bug
# duplicate jobs from gearmand result in exit code 25 of plugins
# because they are executed twice and get killed because of using
# the same ressource.
# Sending results (when exit code is 25 ) will be skipped with this
# enabled.
workaround_rc_25=off
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: What should i do to make my workers do the checks?

Post by tgriep »

There are 2 things that I found in the Worker2's config file that need to be changed.

The server name doesn't match the servers name so change this line from

Code: Select all

server=SR336:4730
to

Code: Select all

server=SR0336:4730
Also, with encryption enabled, the Encryption key needs to match between the server and the worker and they are not the same.

So this line needs to be changed from

Code: Select all

key=should_be_changed
to

Code: Select all

key=Test!23456!
After this, you need to restart the gearman worker service. Run this command to do that.

Code: Select all

service mod_gearman_worker restart
Check your Worker1 server config file to see if it needs to be updated too.
Be sure to check out our Knowledgebase for helpful articles and solutions!
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: What should i do to make my workers do the checks?

Post by litsupport.box »

Ok now im confused, i get the following: this is on every check it has to run.

CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually exists. (worker: sr0363.COMPANYNAME)
[sh: /usr/local/nagios/libexec/check_esx3.pl: No such file or directory

Is it trying to run the script from the worker? i went to to libexec and there is indeed no check_esx3.pl or any other script its asking for. If i need the scripts on the workers aswell, how do i do so?

P.S. I did what you asked me to do and the conf files should be alright right now.

I rebooted the system and now im getting errors and blank page on nagiosxi:
[client ::1] (13)Permission denied: access to /nagiosxi/backend/index.html denied
[client ::1] (13)Permission denied: access to /nagiosxi/backend/index.html.var denied
[client ::1] (13)Permission denied: access to /nagiosxi/backend/index.php denied

(I tried repairing mysql etc.. from other posts about this problem but nobody seems to have solved this)

Frustrating to see that everything was working as it should have done and now this :p
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: What should i do to make my workers do the checks?

Post by litsupport.box »

I'm going to reinstall the master server
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: What should i do to make my workers do the checks?

Post by litsupport.box »

did a reinstall of the master, back and stuck with this:
Image
Image

Okeee... update: weird... third time i recovered from a snapshot it turned green again.. im confused now. Now my workers are acting strange :p
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
Locked