Page 1 of 1

Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 7:42 am
by quental
2 machines with centos 6 , x86_64
manual installation.

Hi,
I have a problem with nagios and mod__gearman. I install in two nodes, and when i look the status of workers, i only see one node:

> gearman_top

Code: Select all

2013-03-21 13:33:56  -  localhost:4730   -  v0.25

 Queue Name                    | Worker Available | Jobs Waiting | Jobs Running
--------------------------------------------------------------------------------
 check_results                 |               1  |           0  |           0
 eventhandler                  |              11  |           0  |           0
 host                          |              11  |           0  |           0
 service                       |              11  |           0  |           6
 worker_nagiossp01.sanit.dom   |               1  |           0  |           0
--------------------------------------------------------------------------------
I not see the second workers, which is in another machine. (nagiossp02.sanit.dom)
In the configuration file, I have set the value of the Master node:
server=10.4.235.101:4730

Can you help me see what happens?

thanks.

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 9:20 am
by slansing
Is the worker running on the remote server?:

Code: Select all

service mod_gearman_worker status
Have you made sure that the keyfile line has been changed on the worker? By default it will not be compatible with the Gearman Server.

Please see the Security section of the following document:

http://assets.nagios.com/downloads/nagi ... ios_XI.pdf

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 9:24 am
by mguthrie
There should be logs you can look up for both the gearman server, and the workers. I would start by increasing the logging output on both ends and then do running tails on both of them to see what's going wrong on the second worker machine. If I remember correctly there should be a gearman specific log or directory somewhere in /var/log. You can increase the logging output by editing settings in /etc/mod_gearman .conf files.

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 10:38 am
by quental
Hi,

i do :

Code: Select all

service mod_gearman_worker status
And the proccess is working OK

The keyfile is created in both machines and have permissions...

in /var/log/mod_gearman/, in logs file there isn`t any significant. I changed the trace level to value 3 and nothing....

I atached the log of slave node and master node...

from worker node, if I do:

Code: Select all

 netstat -anp | grep 4730
tcp        0      1 10.4.235.102:47932          10.4.235.101:4730           SYN_SENT    7074/mod_gearman_wo
tcp        0      1 10.4.235.102:47933          10.4.235.101:4730           SYN_SENT    7073/mod_gearman_wo
tcp        0      1 10.4.235.102:47935          10.4.235.101:4730           SYN_SENT    7077/mod_gearman_wo
tcp        0      1 10.4.235.102:47936          10.4.235.101:4730           SYN_SENT    7076/mod_gearman_wo
tcp        0      1 10.4.235.102:47934          10.4.235.101:4730           SYN_SENT    7078/mod_gearman_wo
tcp        0      1 10.4.235.102:47937          10.4.235.101:4730           SYN_SENT    7075/mod_gearman_wo
is this OK?


any suggestions?

thanks

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 11:15 am
by lmiltchev
After you set up the key file, did you run:

Code: Select all

service nagios restart
service gearmand restart
service mod_gearman_worker restart

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 11:40 am
by quental
Hi,
In master node:

Code: Select all

service nagios start
Starting nagios:[2013-03-21 17:32:43][19459][TRACE] parse_args_line(logfile=/var/log/mod_gearman/mod_gearman_neb.log, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(server=localhost:4730, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(eventhandler=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(services=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(hosts=no, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(do_hostchecks=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(encryption=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(keyfile=/etc/mod_gearman/gearman_key.txt, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(use_uniq_jobs=on, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(localhostgroups=, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(localservicegroups=, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(result_workers=1, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(perfdata=no, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(perfdata_mode=1, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(orphan_host_checks=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(orphan_service_checks=yes, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(accept_clear_results=no, 1)
[2013-03-21 17:32:43][19459][TRACE] parse_args_line(eventhandler=no, 0)
 done.

Code: Select all

service gearmand start
Starting gearmand:                                         [  OK  ]

Code: Select all

service mod_gearman_worker start
Starting mod_gearman_worker...OK
In the slave node:

Code: Select all

service mod_gearman_worker start
Starting mod_gearman_worker...OK
then,, if i run in master node:

Code: Select all

gearman_top

Code: Select all

2013-03-21 17:33:56  -  localhost:4730   -  v0.25

Queue Name                    | Worker Available | Jobs Waiting | Jobs Running
--------------------------------------------------------------------------------
check_results                 |               1  |           0  |           0
eventhandler                  |              11  |           0  |           0
host                          |              11  |           0  |           0
service                       |              11  |           0  |           6
worker_nagiossp01.sanit.dom   |               1  |           0  |           0
--------------------------------------------------------------------------------
the slave node not appear....

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 11:44 am
by lmiltchev
Also, can you run the following command on the worker, and show the output:

Code: Select all

iptables -L -n | grep 4730

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 1:44 pm
by quental
hi all,
Problem solved!!!

The problem was in the filtering port in worker node.

I executed a nmap and:

Code: Select all

nmap -p 4730 10.4.235.101

Starting Nmap 5.21 (<<http://nmap.org> > ) at 2013-03-21 18:23 CET Nmap scan
report for nagiossp01.sanitas.dom (10.4.235.101) Host is up (0.00064s latency).

PORT     STATE    SERVICE

4730/tcp [b]filtered[/b] unknown

MAC Address: 00:50:56:83:41:B2 (VMware)

Nmap done: 1 IP address (1 host up) scanned in 0.11 seconds
in worker node the firewall was active.

i executed:
system-config-firewall-tui
and disabled the internal firewall.

thank you all for your time.
You can close the case.

. :D

Re: Problem mod_gearman distributed nodes

Posted: Thu Mar 21, 2013 2:01 pm
by lmiltchev
Great! I am glad it works. :D I am locking this topic.