host check orphaned
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
The neb file is (theoretically) not in use on the worker servers. Worker's don't intercept checks the same way that the job server does. The neb module intercepts the checks from the Nagios Core process and distributes them to the appropriate workers.
On your job server, you define the queues, and that's what you're doing by adding those hostgroups llines.
On your worker servers you "subscribe" to the queues.
Does that make sense?
On your job server, you define the queues, and that's what you're doing by adding those hostgroups llines.
On your worker servers you "subscribe" to the queues.
Does that make sense?
Re: host check orphaned
Yes, Thank you for the explanation and your patience.
what about doing the same on mod_gearman_worker.conf?
I have restarted mod_gearman. will see how that works
what about doing the same on mod_gearman_worker.conf?
I have restarted mod_gearman. will see how that works
Re: host check orphaned
After the change made, no monitoring seems to be happening at all. I checked several devices and last check was like an hour ago. well..seems to be happening very slowly
and by the way. I am still having orphans
and by the way. I am still having orphans
Last edited by bosecorp on Wed Mar 18, 2015 3:20 pm, edited 1 time in total.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
what does gearman_top look like now?
Re: host check orphaned
something seems to be running. but for some reason is happening very slowly.
btw, I still see a lot orphans
Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 2 | 103 | 2
eventhandler | 34 | 0 | 0
host | 54 | 0 | 0
hostgroup_gearman_dce1 | 7 | 0 | 2
hostgroup_gearman_dcn1 | 7 | 0 | 3
service | 54 | 0 | 40
servicegroup_gearman_dce1 | 7 | 0 | 0
servicegroup_gearman_dcn1 | 7 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------
btw, I still see a lot orphans
Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 2 | 103 | 2
eventhandler | 34 | 0 | 0
host | 54 | 0 | 0
hostgroup_gearman_dce1 | 7 | 0 | 2
hostgroup_gearman_dcn1 | 7 | 0 | 3
service | 54 | 0 | 40
servicegroup_gearman_dce1 | 7 | 0 | 0
servicegroup_gearman_dcn1 | 7 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
I'm sorry. I didn't notice that your hostgroups are also ALL commented out in your worker configs.
In each of your worker configs add a line apprpriate to the hostname:
Only 1 hostgroup need be added per worker config, and they should coincide with the hostgroups you defined in your neb file.
This *should* result in you adding 1 host group to the worker config on 3 separate servers if I'm understanding your setup properly.
In each of your worker configs add a line apprpriate to the hostname:
Code: Select all
hostgroups=gearman_1This *should* result in you adding 1 host group to the worker config on 3 separate servers if I'm understanding your setup properly.
Re: host check orphaned
and just to make sure, in the workers, I need to update the mod_gearman_worker.conf, right. because based on your earlier explanation the workers do not use the mod_gearman_net.conf
now, what if in the JOB server, where I also run as a worker I do not specify any group. any hostgroup. will that mean that any devices that is not part of any hostgroup will run on JOB server where I also run a worker
and just re-cap on what I have done.
in the JOB server ( nagmonus1), I have remove the comments on the neb.conf file. I have included all the hostgroups I have, basically gearmand_dce1, gearman_dcn1 and gearman_no.
still in the JOB server, in the worker_conf file, I have not included any houstgroup. assuming the theory is correct. that if a device is not member of any of these groups then it will run on the JOB server
and lastly, in the workers, per your instructions, I am only have 1 hostgroup in the worker.conf file that matches the hosgroups that in my net.conf (hagmonus1)_
now, what if in the JOB server, where I also run as a worker I do not specify any group. any hostgroup. will that mean that any devices that is not part of any hostgroup will run on JOB server where I also run a worker
and just re-cap on what I have done.
in the JOB server ( nagmonus1), I have remove the comments on the neb.conf file. I have included all the hostgroups I have, basically gearmand_dce1, gearman_dcn1 and gearman_no.
still in the JOB server, in the worker_conf file, I have not included any houstgroup. assuming the theory is correct. that if a device is not member of any of these groups then it will run on the JOB server
and lastly, in the workers, per your instructions, I am only have 1 hostgroup in the worker.conf file that matches the hosgroups that in my net.conf (hagmonus1)_
Last edited by bosecorp on Wed Mar 18, 2015 3:59 pm, edited 1 time in total.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
This is all very confusing - a lot because of the inconsistencies between the configuration and the output that we're getting from gearman_top.
Additionally, no monitoring happening doesn't really jive with the 3rd column "Jobs Running" being >0. Is it still the case that you don't see any service/host checks being submitted/returned?
Your logic is sound, yes... modify mod_gearman_worker.conf on the individual gearman worker servers (as well as the job server if you have a hostgroup defined for it). After modifying the worker.conf files you have to restart the worker for it to pick up the changes.
Do you actually have hostgroups defined in your Nagios config file that correspond to the hostgroups being defined in the gearman configurations?
Indicates that you have workers available on those queues. This can't happen based upon the configs you shared with me where the hostgroup is commented out, that is unless somewhere else there is a worker picking up for those hostgroups.Code: Select all
hostgroup_gearman_dce1 | 7 | 0 | 2 hostgroup_gearman_dcn1 | 7 | 0 | 3
Additionally, no monitoring happening doesn't really jive with the 3rd column "Jobs Running" being >0. Is it still the case that you don't see any service/host checks being submitted/returned?
Your logic is sound, yes... modify mod_gearman_worker.conf on the individual gearman worker servers (as well as the job server if you have a hostgroup defined for it). After modifying the worker.conf files you have to restart the worker for it to pick up the changes.
Do you actually have hostgroups defined in your Nagios config file that correspond to the hostgroups being defined in the gearman configurations?
Re: host check orphaned
Yes, you will have to edit the mod_gearman_worker.conf on each worker that you want the hostgroup for that worker has to run.
And for your question
And for your question
Yes, if you do not have any hostgroups specified in it's worker.conf file, it should take the local checks and run them.now, what if in the JOB server, where I also run as a worker I do not specify any group. any hostgroup. will that mean that any devices that is not part of any hostgroup will run on JOB server where I also run a worker
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: host check orphaned
HI jdalrymple
this the latest from my gearman_top command after I made the changes
2015-03-18 17:04:12 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 4 | 0 | 1
eventhandler | 51 | 0 | 0
host | 62 | 0 | 1
hostgroup_gearman_dce1 | 7 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 62 | 0 | 39
servicegroup_gearman_dce1 | 7 | 0 | 2
servicegroup_gearman_dcn1 | 5 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------
I do see things running. but still monitoring seems to be taking a lot time. the last check for some devices is like 15 min ago.
after all these changes made. my number of orphans have reduce drastically. but still have few consistently
this the latest from my gearman_top command after I made the changes
2015-03-18 17:04:12 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
----------------------------------------------------------------------------
check_results | 4 | 0 | 1
eventhandler | 51 | 0 | 0
host | 62 | 0 | 1
hostgroup_gearman_dce1 | 7 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 62 | 0 | 39
servicegroup_gearman_dce1 | 7 | 0 | 2
servicegroup_gearman_dcn1 | 5 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
----------------------------------------------------------------------------
I do see things running. but still monitoring seems to be taking a lot time. the last check for some devices is like 15 min ago.
after all these changes made. my number of orphans have reduce drastically. but still have few consistently