host check orphaned

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

The neb file is (theoretically) not in use on the worker servers. Worker's don't intercept checks the same way that the job server does. The neb module intercepts the checks from the Nagios Core process and distributes them to the appropriate workers.

On your job server, you define the queues, and that's what you're doing by adding those hostgroups llines.

On your worker servers you "subscribe" to the queues.

Does that make sense?
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

Yes, Thank you for the explanation and your patience.

what about doing the same on mod_gearman_worker.conf?

I have restarted mod_gearman. will see how that works
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

After the change made, no monitoring seems to be happening at all. I checked several devices and last check was like an hour ago. well..seems to be happening very slowly

and by the way. I am still having orphans
Last edited by bosecorp on Wed Mar 18, 2015 3:20 pm, edited 1 time in total.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

what does gearman_top look like now?
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

something seems to be running. but for some reason is happening very slowly.

btw, I still see a lot orphans

Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 2 | 103 | 2
eventhandler | 34 | 0 | 0
host | 54 | 0 | 0
hostgroup_gearman_dce1 | 7 | 0 | 2
hostgroup_gearman_dcn1 | 7 | 0 | 3
service | 54 | 0 | 40
servicegroup_gearman_dce1 | 7 | 0 | 0
servicegroup_gearman_dcn1 | 7 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

I'm sorry. I didn't notice that your hostgroups are also ALL commented out in your worker configs.

In each of your worker configs add a line apprpriate to the hostname:

Code: Select all

hostgroups=gearman_1
Only 1 hostgroup need be added per worker config, and they should coincide with the hostgroups you defined in your neb file.

This *should* result in you adding 1 host group to the worker config on 3 separate servers if I'm understanding your setup properly.
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

and just to make sure, in the workers, I need to update the mod_gearman_worker.conf, right. because based on your earlier explanation the workers do not use the mod_gearman_net.conf

now, what if in the JOB server, where I also run as a worker I do not specify any group. any hostgroup. will that mean that any devices that is not part of any hostgroup will run on JOB server where I also run a worker

and just re-cap on what I have done.

in the JOB server ( nagmonus1), I have remove the comments on the neb.conf file. I have included all the hostgroups I have, basically gearmand_dce1, gearman_dcn1 and gearman_no.
still in the JOB server, in the worker_conf file, I have not included any houstgroup. assuming the theory is correct. that if a device is not member of any of these groups then it will run on the JOB server

and lastly, in the workers, per your instructions, I am only have 1 hostgroup in the worker.conf file that matches the hosgroups that in my net.conf (hagmonus1)_
Last edited by bosecorp on Wed Mar 18, 2015 3:59 pm, edited 1 time in total.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

This is all very confusing - a lot because of the inconsistencies between the configuration and the output that we're getting from gearman_top.

Code: Select all

hostgroup_gearman_dce1 | 7 | 0 | 2
hostgroup_gearman_dcn1 | 7 | 0 | 3
Indicates that you have workers available on those queues. This can't happen based upon the configs you shared with me where the hostgroup is commented out, that is unless somewhere else there is a worker picking up for those hostgroups.

Additionally, no monitoring happening doesn't really jive with the 3rd column "Jobs Running" being >0. Is it still the case that you don't see any service/host checks being submitted/returned?

Your logic is sound, yes... modify mod_gearman_worker.conf on the individual gearman worker servers (as well as the job server if you have a hostgroup defined for it). After modifying the worker.conf files you have to restart the worker for it to pick up the changes.

Do you actually have hostgroups defined in your Nagios config file that correspond to the hostgroups being defined in the gearman configurations?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: host check orphaned

Post by tgriep »

Yes, you will have to edit the mod_gearman_worker.conf on each worker that you want the hostgroup for that worker has to run.

And for your question
now, what if in the JOB server, where I also run as a worker I do not specify any group. any hostgroup. will that mean that any devices that is not part of any hostgroup will run on JOB server where I also run a worker
Yes, if you do not have any hostgroups specified in it's worker.conf file, it should take the local checks and run them.
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

HI jdalrymple
this the latest from my gearman_top command after I made the changes

2015-03-18 17:04:12 - 10.100.30.111:4730 - v0.33

Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 4 | 0 | 1
eventhandler | 51 | 0 | 0
host | 62 | 0 | 1
hostgroup_gearman_dce1 | 7 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 62 | 0 | 39
servicegroup_gearman_dce1 | 7 | 0 | 2
servicegroup_gearman_dcn1 | 5 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------

I do see things running. but still monitoring seems to be taking a lot time. the last check for some devices is like 15 min ago.

after all these changes made. my number of orphans have reduce drastically. but still have few consistently
Locked