Remote Worker not Working - Mod_Gearman
Posted: Thu Jan 08, 2015 12:24 pm
Hello everyone, I was trying to implement Mod_Gearman but a particular problem is blocking me.
I've followed this guide: http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
Mod_Gearman is running fine and all active checks it's been intercepted by gearman NEB with a single exception. Only local workers are OK.
I want to use remote workers for monitoring hosts behind a firewall.
I've installed Mod_gearman just like the guide and configured it to send job's of an specific hostgroup to a remote worker:
nagios.cfg
# Gearman_MOD
broker_module=/usr/lib64/mod_gearman/mod_gearman.o key=should_be_changed server=10.200.0.113 eventhandler=yes hosts=yes services=yes hostgroups=LB2
Local worker configuration (Nagios XI Server):
server=10.200.0.113:4730
eventhandler=yes
services=yes
hosts=yes
NEB Configuration (Nagios XI Server):
server=10.200.0.113:4730
eventhandler=yes
services=yes
hosts=yes
Remote Worker:
server=10.200.0.113:4730
eventhandler=no
services=no
hosts=no
hostgroups=LB2
After a netstat I've discovered that the remote worker process wasn't able to communicate to Mod_gearman server (SYN_SENT) :
[root@nagios-worker01 ~]# netstat -na | grep 10.200.0.113
tcp 0 1 129.0.0.95:60167 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60182 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60180 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60178 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60166 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60168 10.200.0.113:4730 SYN_SENT
Remote host ping:
[root@nagios-worker01 ~]# ping 10.200.0.113
PING 10.200.0.113 (10.200.0.113) 56(84) bytes of data.
64 bytes from 10.200.0.113: icmp_seq=1 ttl=62 time=1.49 ms
64 bytes from 10.200.0.113: icmp_seq=2 ttl=62 time=1.87 ms
Remote host telnet ports 4730(gearman) and 22(Just to prove that the problem it's on gearman):
[root@nagios-worker01 ~]# telnet 10.200.0.113 4730
Trying 10.200.0.113...
telnet: connect to address 10.200.0.113: No route to host
[root@nagios-worker01 ~]# telnet 10.200.0.113 22
Trying 10.200.0.113...
Connected to 10.200.0.113.
Escape character is '^]'.
SSH-2.0-OpenSSH_5.3
Nagios XI Server telnet running fine:
[root@LB2-NagiosXI ~]# telnet localhost 4730
Trying ::1...
Connected to localhost.
Escape character is '^]'.
To eliminate the chance of a network problem I've tested telnet inside a host of the same network:
# hostname -i
10.200.0.114
# telnet 10.200.0.113 4730
Trying 10.200.0.113...
telnet: connect to address 10.200.0.113: No route to host
Can someone help me?
Machine Configs:
CentOS 6.6 64 bits (VMware Image) Nagios XI 2014 R2.0
Thanks in advance
I've followed this guide: http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
Mod_Gearman is running fine and all active checks it's been intercepted by gearman NEB with a single exception. Only local workers are OK.
I want to use remote workers for monitoring hosts behind a firewall.
I've installed Mod_gearman just like the guide and configured it to send job's of an specific hostgroup to a remote worker:
nagios.cfg
# Gearman_MOD
broker_module=/usr/lib64/mod_gearman/mod_gearman.o key=should_be_changed server=10.200.0.113 eventhandler=yes hosts=yes services=yes hostgroups=LB2
Local worker configuration (Nagios XI Server):
server=10.200.0.113:4730
eventhandler=yes
services=yes
hosts=yes
NEB Configuration (Nagios XI Server):
server=10.200.0.113:4730
eventhandler=yes
services=yes
hosts=yes
Remote Worker:
server=10.200.0.113:4730
eventhandler=no
services=no
hosts=no
hostgroups=LB2
After a netstat I've discovered that the remote worker process wasn't able to communicate to Mod_gearman server (SYN_SENT) :
[root@nagios-worker01 ~]# netstat -na | grep 10.200.0.113
tcp 0 1 129.0.0.95:60167 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60182 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60180 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60178 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60166 10.200.0.113:4730 SYN_SENT
tcp 0 1 129.0.0.95:60168 10.200.0.113:4730 SYN_SENT
Remote host ping:
[root@nagios-worker01 ~]# ping 10.200.0.113
PING 10.200.0.113 (10.200.0.113) 56(84) bytes of data.
64 bytes from 10.200.0.113: icmp_seq=1 ttl=62 time=1.49 ms
64 bytes from 10.200.0.113: icmp_seq=2 ttl=62 time=1.87 ms
Remote host telnet ports 4730(gearman) and 22(Just to prove that the problem it's on gearman):
[root@nagios-worker01 ~]# telnet 10.200.0.113 4730
Trying 10.200.0.113...
telnet: connect to address 10.200.0.113: No route to host
[root@nagios-worker01 ~]# telnet 10.200.0.113 22
Trying 10.200.0.113...
Connected to 10.200.0.113.
Escape character is '^]'.
SSH-2.0-OpenSSH_5.3
Nagios XI Server telnet running fine:
[root@LB2-NagiosXI ~]# telnet localhost 4730
Trying ::1...
Connected to localhost.
Escape character is '^]'.
To eliminate the chance of a network problem I've tested telnet inside a host of the same network:
# hostname -i
10.200.0.114
# telnet 10.200.0.113 4730
Trying 10.200.0.113...
telnet: connect to address 10.200.0.113: No route to host
Can someone help me?
Machine Configs:
CentOS 6.6 64 bits (VMware Image) Nagios XI 2014 R2.0
Thanks in advance