mod_gearman

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
delboy1966
Posts: 98
Joined: Thu Oct 22, 2015 5:26 am

mod_gearman

Post by delboy1966 »

I'm currently deploring a new Nagios environment which needs to have distributed nagios servers because of the number of hosts/services there is.
I am using Nagios 4.1.1 and Sles11sp3
I've use NSCA before but don't think it would be ideal in this new environment because of the numbers involved and having to add all the host/service configs onto 2 servers whenever I needed to add more or change anything.
Looked at DNX but its not supported for Nagios Core 4x so have looked at mod_gearman.

It looks pretty good and fits the needs I have and seems to have installed correctly.
I have a few queries regarding mod_gearman.

1, Error in logs
I get the following in /var/log/gearmand/gearmand.log:

[ main ] Failed to listen on :::4730 -> libgearman-server/gearmand.cc:442

But I'm able to telnet to port 4730 ok:

# telnet localhost 4730
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Its definitely gearmand that has the port open because if I stop gearmand I can no longer connect to the port.
Why is this happening?

2. Nagios configs
Do I just setup my service definitions as I would if this was a standalone Nagios server? Or do I have to it as if the services were all passive?

3. Checks
How do I know if mod_gearman is handling the service checks and Nagios isn't running the checks itself.
I have a couple of hosts configured in a test environment with some services on each and dropped them into their own hostgroup and defined this hostgroup in the mod_gearman configs as the only ones to run the checks for.
Currently this is all on one box as I'm waiting for new ones to be provisioned, so have the gearman working running on the main Nagios box.
Running gearman_top2 shows:

2015-10-22 13:28:23 - localhost:4730 - v0.33

Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------
eventhandler | 10 | 0 | 0
host | 10 | 0 | 0
hostgroup_local | 10 | 0 | 0
service | 10 | 0 | 0
worker_nagla01lv | 1 | 0 | 0
-------------------------------------------------------------------

And now and again the worker available values all drop to 1 or 0 and then all come back to 10.
However I see nothing in the logs to show mod_gearman is doing the checks even though i have the debug set to its highest.


Any help would be appreciated.
Thanks
Tony
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: mod_gearman

Post by jdalrymple »

1) Someone else saw the same thing - wonder if it was a bug where gearmand does some sort of situation where it tries to spin up twice or something:

https://groups.google.com/forum/#!msg/g ... HOLux9cp8J

How did you install gearmand? Maybe worth updating to the current and building from source. Either way, it looks like a harmless error to me.

2) Nothing special

3)
delboy1966 wrote:dropped them into their own hostgroup and defined this hostgroup in the mod_gearman configs
That's how.

If I recall correctly only orphaned checks get their worker name cited. Successfully run checks you can't really tell which worker gets them.

4)ish
gearman_top does that where the queues will flicker to 0 and back. It's not harmful. I'm not sure why it does that. Also be aware the queue sizes will adjust based upon workload, but the flickering to 0 is just an interface nuance.
Locked