host check orphaned
host check orphaned
Since about a week ago I am getting this error
(host check orphaned, is the mod-gearman worker on queue 'host' running?
Itried rebooting my server, but still don;t help
I am running Nagios 4. the gearman version is 1.4 because that is the only one supported by Nagios 4
I am running RHEL 6, 64 bits. the Servers are running in VMware. this was a manual installation.
the database server is a separate server.
(host check orphaned, is the mod-gearman worker on queue 'host' running?
Itried rebooting my server, but still don;t help
I am running Nagios 4. the gearman version is 1.4 because that is the only one supported by Nagios 4
I am running RHEL 6, 64 bits. the Servers are running in VMware. this was a manual installation.
the database server is a separate server.
Re: host check orphaned
Please run the following on the problem host and report the results:
Code: Select all
gearman_top
tail -n20 /var/log/gearmand.logRe: host check orphaned
Code: Select all
Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 6 | 0 | 0
host | 13 | 0 | 2
hostgroup_gearman_dce1 | 5 | 0 | 0
hostgroup_gearman_dcn1 | 5 | 0 | 0
service | 13 | 0 | 0
worker_gearmandce1 | 1 | 0 | 0
worker_gearmandcn1 | 1 | 0 | 0
worker_nagmonus1 | 1 | 0 | 0
worker_nagmonus2 | 1 | 0 | 0
-------------------------------------------------------------------------
root@nagmonus1:(03-10 11:32): /root
# tail -n20 /var/log/gearmand/gearmand.log
ERROR 2015-02-08 20:26:28.000000 [ 2 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 4 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 4 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-08 20:26:28.000000 [ 2 ] recv(Connection timed out) -> libgearman-server/io.cc:105
ERROR 2015-02-08 20:26:28.000000 [ 2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-09 03:10:49.000000 [ 3 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2015-02-09 03:10:49.000000 [ 3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-09 14:48:56.000000 [ 4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2015-02-09 14:48:56.000000 [ 4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2015-02-10 13:42:29.000000 [ 4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2015-02-10 13:42:29.000000 [ 4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
root@nagmonus1:(03-10 11:32): /root
#Re: host check orphaned
It's possible that the timeout on some your hostcheck plugins (depending on what you are using) is higher than the nagios host check timeout.
You can try increasing the "host_check_timeout" value a bit, restarting nagios, gearmand and the workers. Let us know if this resolved your issue.
Code: Select all
grep host_check_timeout /usr/local/nagios/etc/nagios.cfgBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: host check orphaned
this is the value
# grep host_check_timeout /usr/local/nagios/etc/nagios.cfg
host_check_timeout=30
I tried increasing the value and it did not work
# grep host_check_timeout /usr/local/nagios/etc/nagios.cfg
host_check_timeout=30
I tried increasing the value and it did not work
Re: host check orphaned
Run the following commands and show us the output in code wraps:
Code: Select all
/usr/local/nagios/bin/nagios | head -2
/usr/local/nagios/bin/ndo2db | head -2
grep broker /usr/local/nagios/etc/nagios.cfg
rpm -qa | grep gearmanBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: host check orphaned
Code: Select all
# /usr/local/nagios/bin/nagios | head -2
Nagios Core 4.0.8
You have mail in /var/spool/mail/root
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# /usr/local/nagios/bin/ndo2db | head -2
grep broker /usr/local/nagios/etc/nagios.cfg
NDO2DB 2.0.0
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# grep broker /usr/local/nagios/etc/nagios.cfg
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
event_broker_options=-1
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# rpm -qa | grep gearman
gearmand-0.25-1.x86_64
mod_gearman-1.4_nagios4-1.el6.x86_64
libgearman-1.1.8-2.el6.x86_64
gearmand-server-0.33-2.x86_64
gearmand-devel-0.25-1.x86_64
root@nagmonus1:(03-10 13:51): /usr/local/nagios/varwhat are the implications of increasing
host_check_timeout
why we need to increase it. what is the logic behind it. I am just trying to understand
--------
I just increased the number all the way up to 290 and is now working fine.
but my previous question still stands. what are the implications of increasing the number that high? what is this value for?
Edit: take that back. it started again. it only worked for few
Re: host check orphaned
Regarding host_check_timeout
http://nagios.sourceforge.net/docs/3_0/configmain.htmlThis is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
Re: host check orphaned
thanks for the explanation.
but unfortunately, the problem came back.
but unfortunately, the problem came back.
Re: host check orphaned
Are the orphaned host checks only related to hosts that are not responding?
If you manually run the host check from the gearman server, do you get a timeout?
If you manually run the host check from the gearman server, do you get a timeout?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.