host check orphaned
Re: host check orphaned
I have PM you some of the IPs that are orphaned.
the IPs that you PM me that actually IP that are down. that is good, and I am aware of that. it;s the orphaned that I am concern about.
logs I sent I believe where for more than 5 minutes.
let me know if you need me to turn debugging back on
the IPs that you PM me that actually IP that are down. that is good, and I am aware of that. it;s the orphaned that I am concern about.
logs I sent I believe where for more than 5 minutes.
let me know if you need me to turn debugging back on
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
Everything in the logs looks OK, even the checks appear to be returning results to the job server for those hosts you mentioned in PM:
The only other thing that may help (maybe try this beforehand) would be to increase the number of workers. At present I see 11 workers (right?), but I'm seeing about 8540 service checks in about 130 seconds - that's pretty aggressive.
Code: Select all
[2015-03-17 12:07:31][30318][TRACE] command: /usr/local/nagios/libexec/check_icmp -H <IPADDR> -w 3000.0,80% -c 5000.0,100% -p 5
[2015-03-17 12:07:31][30318][TRACE] data:
host_name=<hostname>
core_start_time=1426608445.0
start_time=1426608451.156580
finish_time=1426608451.162433
return_code=0
exited_ok=1
source=Mod-Gearman Worker @ nagmonus1
output=OK - <IPADDR>: rta 0.703ms, lost 0%|rta=0.703ms;3000.000;5000.000;0; pl=0%;80;100;; \n
[2015-03-17 12:07:31][30318][TRACE] add_job_to_queue(check_results, (null), 2, 1, 1, 1)
[2015-03-17 12:07:31][30318][TRACE] 281 --->host_name=<hostname>
core_start_time=1426608445.0
start_time=1426608451.156580
finish_time=1426608451.162433
return_code=0
exited_ok=1
source=Mod-Gearman Worker @ nagmonus1
output=OK - <IPADDR>: rta 0.703ms, lost 0%|rta=0.703ms;3000.000;5000.000;0; pl=0%;80;100;; \n
[2015-03-17 12:07:31][30318][TRACE] add_job_to_queue() finished successfully: 0 0
[2015-03-17 12:07:31][30318][TRACE] send_result_back() finished successfully
[2015-03-17 12:07:31][30318][TRACE] send_result_back() has no duplicate servers to send to.
[2015-03-17 12:07:31][30318][TRACE] set_state(1)
[2015-03-17 12:07:31][30318][TRACE] set_state(0)At this point I'm tempted to say yes.bosecorp wrote: question, do I need to update the gearmand as well. I only updated mod_gearman
The only other thing that may help (maybe try this beforehand) would be to increase the number of workers. At present I see 11 workers (right?), but I'm seeing about 8540 service checks in about 130 seconds - that's pretty aggressive.
Re: host check orphaned
I have upgraded the gearman servers as well last night. it did not make any difference
are you saying that this could be a performance issue and therefore we might need to increase the number of workers?
I only have 4 workers
2015-03-17 17:09:43 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 20 | 0 | 0
host | 20 | 0 | 0
hostgroup_gearman_dce1 | 0 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 20 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
-------------------------------------------------------------------------
and lastly, how do I verify that the workers, in this case gearmandce1 and gearmandcn1 are actually doing the monitoring activities as well. I am starting to think that maybe nagmonus1 is doing all the work.
are you saying that this could be a performance issue and therefore we might need to increase the number of workers?
I only have 4 workers
2015-03-17 17:09:43 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 20 | 0 | 0
host | 20 | 0 | 0
hostgroup_gearman_dce1 | 0 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 20 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
-------------------------------------------------------------------------
and lastly, how do I verify that the workers, in this case gearmandce1 and gearmandcn1 are actually doing the monitoring activities as well. I am starting to think that maybe nagmonus1 is doing all the work.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
Your gearman_top should tell you, especially with the massive massive number of checks you have running. To be honest I'm baffled by the output that you're showing us from your gearman_top. Do the number of jobs waiting/running ever change from 0? I would expect both of those columns to be double digit numbers based upon the log output for AT LEAST 1 worker.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: host check orphaned
run gearman_topbosecorp wrote:and lastly, how do I verify that the workers, in this case gearmandce1 and gearmandcn1 are actually doing the monitoring activities as well. I am starting to think that maybe nagmonus1 is doing all the work.
Then from a worker stop the worker service. You should see the queues build up. Stopping all the workers should shed some light onto how the queues are working.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: host check orphaned
we might be onto something
I don't see Jobs ruuning/waiting ever going more than 1, in fact I don;t remember ever being 1
I have stopped one of the workers and I don;t see anything building up
this is after I stopped gearmandce1
2015-03-17 19:02:48 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 10 | 0 | 0
host | 10 | 0 | 0
hostgroup_gearman_dce1 | 0 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 10 | 0 | 0
worker_gearmandce1 | 0 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
-------------------------------------------------------------------------
could this be an issue with configuration
I think I said this before, the way I control who does the monitoring activities is by hostgroups. the hostgroups I have are gearman_no, gearman_dce1 & gearman_dcn1
I have PM you the config files of my workers.
this is what I am also seeing in the nagios.log file
[1426688576] Warning: The check of service 'Port 13137 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13602 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13621 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13630 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13633 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13634 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686136; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13647 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13649 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685398; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13652 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685442; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5001 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685634; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5182 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685670; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5188 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686341; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10110 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10114 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10116 Bandwidth' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10121 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10604 Bandwidth' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686116; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10702 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10138 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685652; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10202 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10602 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685398; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10611 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10619 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10623 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10624 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10626 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685396; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10627 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685396; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10629 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10630 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685443; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10631 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10636 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service.
I don't see Jobs ruuning/waiting ever going more than 1, in fact I don;t remember ever being 1
I have stopped one of the workers and I don;t see anything building up
this is after I stopped gearmandce1
2015-03-17 19:02:48 - 10.100.30.111:4730 - v0.33
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 10 | 0 | 0
host | 10 | 0 | 0
hostgroup_gearman_dce1 | 0 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 10 | 0 | 0
worker_gearmandce1 | 0 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
-------------------------------------------------------------------------
could this be an issue with configuration
I think I said this before, the way I control who does the monitoring activities is by hostgroups. the hostgroups I have are gearman_no, gearman_dce1 & gearman_dcn1
I have PM you the config files of my workers.
this is what I am also seeing in the nagios.log file
[1426688576] Warning: The check of service 'Port 13137 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13602 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13621 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13630 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13633 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13634 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686136; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13647 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13649 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685398; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 13652 Status' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685442; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5001 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685634; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5182 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426685670; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 5188 Bandwidth' on host 'uswb-idf-25.bose.com' looks like it was orphaned (results never came back; last_check=1426686341; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10110 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10114 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10116 Bandwidth' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10121 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10604 Bandwidth' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686116; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10702 Status' on host 'uswb-mdf-5510.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10138 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685652; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10202 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10602 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685398; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10611 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10619 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10623 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10624 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686093; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10626 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685396; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10627 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685396; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10629 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686153; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10630 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426685443; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10631 Status' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686135; next_check=1426687855). I'm scheduling an immediate check of the service...
[1426688576] Warning: The check of service 'Port 10636 Bandwidth' on host 'uswb-ocg-lab.bose.com' looks like it was orphaned (results never came back; last_check=1426686117; next_check=1426687855). I'm scheduling an immediate check of the service.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
Why do you have the hostgroups commented out in your neb config? That is definitely causing some of the confusion:
Get rid of those comment marks on the gearmans you expect to get work allocated to them, then restart Nagios.
Code: Select all
# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
#hostgroups=name1
#hostgroups=name2,name3
#hostgroups=gearman_a
#hostgroups=gearman_b
#hostgroups=gearman_cRe: host check orphaned
which of the neb configs, in all 3 of them?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: host check orphaned
Just the one on the host from which you're running gearman_top, the job server.
Re: host check orphaned
done.
question, why not on the workers servers.
question, why not on the workers servers.