Mod-Gearman integration
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Mod-Gearman integration
Hi there
We are producing a distributed Nagios XI hierarchy using Mod-Gearman, and I'm struggling with the installation process. I've followed the guide here: https://assets.nagios.com/downloads/nag ... ios_XI.pdf
Everything has completed successfully (I used the automated installation of Mod-Gearman on CentOS 7), and gearman_top reports 2 workers present (I disabled the worker on the Nagios XI server).
However, when I check the network traffic using tcpdump, I see that all service checks are being performed by the XI server still - the workers seem to be idle. I feel like I've missed a fundamental step in the setup and configuration of this architecture, but I'm not clear what that is. Please could someone help me to complete the configuration such that the 2 external workers perform all service checks, thus removing as much load as possible from the XI host?
Many thanks
Robin
We are producing a distributed Nagios XI hierarchy using Mod-Gearman, and I'm struggling with the installation process. I've followed the guide here: https://assets.nagios.com/downloads/nag ... ios_XI.pdf
Everything has completed successfully (I used the automated installation of Mod-Gearman on CentOS 7), and gearman_top reports 2 workers present (I disabled the worker on the Nagios XI server).
However, when I check the network traffic using tcpdump, I see that all service checks are being performed by the XI server still - the workers seem to be idle. I feel like I've missed a fundamental step in the setup and configuration of this architecture, but I'm not clear what that is. Please could someone help me to complete the configuration such that the 2 external workers perform all service checks, thus removing as much load as possible from the XI host?
Many thanks
Robin
Re: Mod-Gearman integration
In the /etc/mod_gearman/ folder on the Nagios serevr and the remote workers are the configuration files for Mod Gearman.
Can you post them here so we can see them?
Run this on the Nagios XI server and post it so we can see the Gearman queue.
Most of the time that the workers are not running the checks is that the Host Groups are not set correctly on both the server and the workers so make sure they are set.
Can you post them here so we can see them?
Run this on the Nagios XI server and post it so we can see the Gearman queue.
Code: Select all
gearman_top -bBe sure to check out our Knowledgebase for helpful articles and solutions!
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Re: Mod-Gearman integration
Hi there
Thanks for this - is there a document or KB article that details this setup? I completed the Mod-gearman setup KB from Nagios and the way in which that is written implies (to me) that that was all that was required, but I believe from your reply that I've misunderstood. If there's a document that goes over the basics of completing configuration that would be awesome!
Thanks in advance,
Robin
Thanks for this - is there a document or KB article that details this setup? I completed the Mod-gearman setup KB from Nagios and the way in which that is written implies (to me) that that was all that was required, but I believe from your reply that I've misunderstood. If there's a document that goes over the basics of completing configuration that would be awesome!
Thanks in advance,
Robin
Re: Mod-Gearman integration
In the PDF from your first post, look at page 7 under this section "Notes About Queues" that has a link to a document that explains how to setup Mod Gearman and queues to run commands exclusively on remote workers.
Below is the link from that section.
https://support.nagios.com/kb/article.php?id=484
Below is the link from that section.
https://support.nagios.com/kb/article.php?id=484
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Re: Mod-Gearman integration
Hi there
Thanks so much for your continued help - on this topic, I have read the KB article you pointed me at, but this just highlights the issue I am seeing. From the doucment:
Many thanks!
Thanks so much for your continued help - on this topic, I have read the KB article you pointed me at, but this just highlights the issue I am seeing. From the doucment:
This implies that upon install of MG, all checks should be handed off to the workers by default. My testing has shown that the workers are idle, and all service checks are still coming from the Nagios XI server. It feels like the default configuration quoted in the document is not being created by the current version of XI and MG.With a default configuration, when nagios starts it hands off the host and service checks to MG.
MG creates two queues called host and service. You can think of these queues as the default "catch all" queues.
By default, any worker that connects will execute checks from these queues.
Many thanks!
Re: Mod-Gearman integration
The Mod Gearman packages were updated on October and a few of the default settings were changed to not run checks randomly on workers.
This was done so you have to make sure the workers have the plugins, and rights to run the checks and not cause errors when the plugins do run on the worker but cannot run the checks for various reasons.
To put that back, edit the module.conf file and change the following from
to
Save the change and restart nagios and that will enable the generic host and service queues.
I updated the KB article to reflect the change.
This was done so you have to make sure the workers have the plugins, and rights to run the checks and not cause errors when the plugins do run on the worker but cannot run the checks for various reasons.
To put that back, edit the module.conf file and change the following from
Code: Select all
services=no
hosts=noCode: Select all
services=yes
hosts=yesI updated the KB article to reflect the change.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Re: Mod-Gearman integration
Hi there
Thanks for this, and the KB update - it has helped make things much clearer. However I'm still not seeing the behaviour described. To confirm, I have:
1. Gone through both worker nodes and enabled servces and hosts in the module.conf file.
2. Restarted all mod_gearman workers
3. Rebooted the Nagios XI host
The output of gearman_top shows:
So it can see the 2 workers. However when I run tcpdump on the example host, all packets are still coming from the Nagios XI host (even though I disabled the worker on this host).
I feel like there's a fundamental piece of configuration I've still missed. I have used rsync to copy all the check commands (/usr/local/nagios/libexec/*) to the worker nodes (target directory: /usr/local/nagios/libexec), and I have checked the worker logs but there are no errors - it really looks like the Nagios XI node, in spite of running the mod_gearman service, is still not actively using the workers for service checks. In fact, the only entry I ever see in the logs is:
To me this confirms checks are not being sent to the workers - I can't see what I'm doing wrong here - would really appreciate your help.
Many thanks
Robin
Thanks for this, and the KB update - it has helped make things much clearer. However I'm still not seeing the behaviour described. To confirm, I have:
1. Gone through both worker nodes and enabled servces and hosts in the module.conf file.
2. Restarted all mod_gearman workers
3. Rebooted the Nagios XI host
The output of gearman_top shows:
Code: Select all
2021-01-25 16:01:58 - localhost:4730 - v1.1.19.1
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-----------------------------------------------------------------------------
check_results | 1 | 0 | 0
host | 10 | 0 | 0
service | 10 | 0 | 0
worker_gearw01.example.org | 1 | 0 | 0
worker_gearw02.example.org | 1 | 0 | 0
-----------------------------------------------------------------------------I feel like there's a fundamental piece of configuration I've still missed. I have used rsync to copy all the check commands (/usr/local/nagios/libexec/*) to the worker nodes (target directory: /usr/local/nagios/libexec), and I have checked the worker logs but there are no errors - it really looks like the Nagios XI node, in spite of running the mod_gearman service, is still not actively using the workers for service checks. In fact, the only entry I ever see in the logs is:
Code: Select all
[2021-01-25 16:12:10][1733][INFO ] no checks in 2minutes, restarting all workers
Many thanks
Robin
Re: Mod-Gearman integration
There may be another configuration issue.
On the 2 workers and the Nagios server, get the /etc/mod_gearman/worker.conf files and post them so we can view them.
On the Nagios server, get the /etc/mod_gearman/module.conf file and the /usr/local/nagios/etc/nagios.cfg file and post them.
Thanks
On the 2 workers and the Nagios server, get the /etc/mod_gearman/worker.conf files and post them so we can view them.
On the Nagios server, get the /etc/mod_gearman/module.conf file and the /usr/local/nagios/etc/nagios.cfg file and post them.
Thanks
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Re: Mod-Gearman integration
Sorry for the delay in getting back to you - I've had other project work get in the way of debugging this issue!
I have extracted the files you requested - please find them attached. The naming should make sense but any queries let me know. These have not been edited by hand at any point apart from setting:
as requested. Apart from this, they are exactly as generated by XI or by the install scripts. I'm quite sure I'm missing something obvious, but I can't spot from the documentation what this might be - really hope you can help me figure this out!
Robin
I have extracted the files you requested - please find them attached. The naming should make sense but any queries let me know. These have not been edited by hand at any point apart from setting:
Code: Select all
services=yes
hosts=yes
Robin
You do not have the required permissions to view the files attached to this post.
Re: Mod-Gearman integration
In the module.conf file on the Nagios server, change the following from
to
Save the change and restart nagios.
That will enable the default host and service queues and the workers should start running the checks.
If you do not want any checks running from the Nagios server at all, make sure the worker is not running on the Nagios server.
Code: Select all
services=no
hosts=noCode: Select all
services=yes
hosts=yesThat will enable the default host and service queues and the workers should start running the checks.
If you do not want any checks running from the Nagios server at all, make sure the worker is not running on the Nagios server.
Be sure to check out our Knowledgebase for helpful articles and solutions!