host check orphaned

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

host check orphaned

Post by bosecorp »

Since about a week ago I am getting this error

(host check orphaned, is the mod-gearman worker on queue 'host' running?

Itried rebooting my server, but still don;t help

I am running Nagios 4. the gearman version is 1.4 because that is the only one supported by Nagios 4

I am running RHEL 6, 64 bits. the Servers are running in VMware. this was a manual installation.

the database server is a separate server.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: host check orphaned

Post by jolson »

Please run the following on the problem host and report the results:

Code: Select all

gearman_top
tail -n20 /var/log/gearmand.log
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

Code: Select all

Queue Name             | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------
 check_results          |               1  |           0  |           0
 eventhandler           |               6  |           0  |           0
 host                   |              13  |           0  |           2
 hostgroup_gearman_dce1 |               5  |           0  |           0
 hostgroup_gearman_dcn1 |               5  |           0  |           0
 service                |              13  |           0  |           0
 worker_gearmandce1     |               1  |           0  |           0
 worker_gearmandcn1     |               1  |           0  |           0
 worker_nagmonus1       |               1  |           0  |           0
 worker_nagmonus2       |               1  |           0  |           0
-------------------------------------------------------------------------

root@nagmonus1:(03-10 11:32): /root
# tail -n20 /var/log/gearmand/gearmand.log 
  ERROR 2015-02-08 20:26:28.000000 [     2 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     4 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     4 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     3 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-08 20:26:28.000000 [     2 ] recv(Connection timed out) -> libgearman-server/io.cc:105
  ERROR 2015-02-08 20:26:28.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-09 03:10:49.000000 [     3 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2015-02-09 03:10:49.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-09 14:48:56.000000 [     4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2015-02-09 14:48:56.000000 [     4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2015-02-10 13:42:29.000000 [     4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2015-02-10 13:42:29.000000 [     4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
root@nagmonus1:(03-10 11:32): /root
#
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: host check orphaned

Post by lmiltchev »

It's possible that the timeout on some your hostcheck plugins (depending on what you are using) is higher than the nagios host check timeout.

Code: Select all

grep host_check_timeout /usr/local/nagios/etc/nagios.cfg
You can try increasing the "host_check_timeout" value a bit, restarting nagios, gearmand and the workers. Let us know if this resolved your issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

this is the value

# grep host_check_timeout /usr/local/nagios/etc/nagios.cfg
host_check_timeout=30

I tried increasing the value and it did not work
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: host check orphaned

Post by lmiltchev »

Run the following commands and show us the output in code wraps:

Code: Select all

/usr/local/nagios/bin/nagios | head -2
/usr/local/nagios/bin/ndo2db | head -2
grep broker /usr/local/nagios/etc/nagios.cfg
rpm -qa | grep gearman
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

Code: Select all

# /usr/local/nagios/bin/nagios | head -2

Nagios Core 4.0.8
You have mail in /var/spool/mail/root
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# /usr/local/nagios/bin/ndo2db | head -2
grep broker /usr/local/nagios/etc/nagios.cfg

NDO2DB 2.0.0
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# grep broker /usr/local/nagios/etc/nagios.cfg
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
event_broker_options=-1
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var
# rpm -qa | grep gearman
gearmand-0.25-1.x86_64
mod_gearman-1.4_nagios4-1.el6.x86_64
libgearman-1.1.8-2.el6.x86_64
gearmand-server-0.33-2.x86_64
gearmand-devel-0.25-1.x86_64
root@nagmonus1:(03-10 13:51): /usr/local/nagios/var

what are the implications of increasing

host_check_timeout

why we need to increase it. what is the logic behind it. I am just trying to understand

--------

I just increased the number all the way up to 290 and is now working fine.

but my previous question still stands. what are the implications of increasing the number that high? what is this value for?

Edit: take that back. it started again. it only worked for few
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: host check orphaned

Post by jolson »

Regarding host_check_timeout
This is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
http://nagios.sourceforge.net/docs/3_0/configmain.html
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: host check orphaned

Post by bosecorp »

thanks for the explanation.

but unfortunately, the problem came back.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: host check orphaned

Post by abrist »

Are the orphaned host checks only related to hosts that are not responding?
If you manually run the host check from the gearman server, do you get a timeout?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked