mod_gearman
Posted: Thu Oct 22, 2015 7:30 am
I'm currently deploring a new Nagios environment which needs to have distributed nagios servers because of the number of hosts/services there is.
I am using Nagios 4.1.1 and Sles11sp3
I've use NSCA before but don't think it would be ideal in this new environment because of the numbers involved and having to add all the host/service configs onto 2 servers whenever I needed to add more or change anything.
Looked at DNX but its not supported for Nagios Core 4x so have looked at mod_gearman.
It looks pretty good and fits the needs I have and seems to have installed correctly.
I have a few queries regarding mod_gearman.
1, Error in logs
I get the following in /var/log/gearmand/gearmand.log:
[ main ] Failed to listen on :::4730 -> libgearman-server/gearmand.cc:442
But I'm able to telnet to port 4730 ok:
# telnet localhost 4730
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Its definitely gearmand that has the port open because if I stop gearmand I can no longer connect to the port.
Why is this happening?
2. Nagios configs
Do I just setup my service definitions as I would if this was a standalone Nagios server? Or do I have to it as if the services were all passive?
3. Checks
How do I know if mod_gearman is handling the service checks and Nagios isn't running the checks itself.
I have a couple of hosts configured in a test environment with some services on each and dropped them into their own hostgroup and defined this hostgroup in the mod_gearman configs as the only ones to run the checks for.
Currently this is all on one box as I'm waiting for new ones to be provisioned, so have the gearman working running on the main Nagios box.
Running gearman_top2 shows:
2015-10-22 13:28:23 - localhost:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------
eventhandler | 10 | 0 | 0
host | 10 | 0 | 0
hostgroup_local | 10 | 0 | 0
service | 10 | 0 | 0
worker_nagla01lv | 1 | 0 | 0
-------------------------------------------------------------------
And now and again the worker available values all drop to 1 or 0 and then all come back to 10.
However I see nothing in the logs to show mod_gearman is doing the checks even though i have the debug set to its highest.
Any help would be appreciated.
Thanks
Tony
I am using Nagios 4.1.1 and Sles11sp3
I've use NSCA before but don't think it would be ideal in this new environment because of the numbers involved and having to add all the host/service configs onto 2 servers whenever I needed to add more or change anything.
Looked at DNX but its not supported for Nagios Core 4x so have looked at mod_gearman.
It looks pretty good and fits the needs I have and seems to have installed correctly.
I have a few queries regarding mod_gearman.
1, Error in logs
I get the following in /var/log/gearmand/gearmand.log:
[ main ] Failed to listen on :::4730 -> libgearman-server/gearmand.cc:442
But I'm able to telnet to port 4730 ok:
# telnet localhost 4730
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Its definitely gearmand that has the port open because if I stop gearmand I can no longer connect to the port.
Why is this happening?
2. Nagios configs
Do I just setup my service definitions as I would if this was a standalone Nagios server? Or do I have to it as if the services were all passive?
3. Checks
How do I know if mod_gearman is handling the service checks and Nagios isn't running the checks itself.
I have a couple of hosts configured in a test environment with some services on each and dropped them into their own hostgroup and defined this hostgroup in the mod_gearman configs as the only ones to run the checks for.
Currently this is all on one box as I'm waiting for new ones to be provisioned, so have the gearman working running on the main Nagios box.
Running gearman_top2 shows:
2015-10-22 13:28:23 - localhost:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------
eventhandler | 10 | 0 | 0
host | 10 | 0 | 0
hostgroup_local | 10 | 0 | 0
service | 10 | 0 | 0
worker_nagla01lv | 1 | 0 | 0
-------------------------------------------------------------------
And now and again the worker available values all drop to 1 or 0 and then all come back to 10.
However I see nothing in the logs to show mod_gearman is doing the checks even though i have the debug set to its highest.
Any help would be appreciated.
Thanks
Tony