Mod Gearman Installation Issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Mod Gearman Installation Issue

Post by WillemDH »

Hello,

So I was able to add a hostgroup and servicegroup to the gearman config files. When I do a gearman_top, I can see the configured hostgroup and servicegroup, but I never see any jobs waiting or jobs running. See screenshot.

Extract of mod_gearman_neb.conf on the Nagios XI Production server:

Code: Select all

# defines if the module should distribute execution of
# eventhandlers.
eventhandler=no


# defines if the module should distribute execution of
# service checks.
services=yes


# defines if the module should distribute execution of
# host checks.
hosts=yes


# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=all_srv_gearman
#hostgroups=name2,name3


# sets a list of servicegroups which will go into seperate
# queues.
servicegroups=all_gearman_services
Extract of /etc/mod_gearman/mod_gearman_worker.conf on the worker node.

Code: Select all

# defines if the module should distribute execution of
# eventhandlers.
eventhandler=no


# defines if the module should distribute execution of
# service checks.
services=yes


# defines if the module should distribute execution of
# host checks.
hosts=yes


# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=all_srv_gearman
#hostgroups=name2,name3


# sets a list of servicegroups which will go into seperate
# queues.
servicegroups=all_gearman_services
The hostgroup contains four Windows hosts. I edited the nscleint ini file and added the ip address of the worker node.

What am I doing wrong? I'd like to see some jobs queued or running on the worker node when I schedule immediate check for all services on soem of the hosts in the hostgroup all_srv_gearman.

Tried setting debug=2, but /var/log/mod_gearman/mod_gearman_neb.log is not even being created it seems. On the worker node the /var/log/mod_gearman/mod_gearman_worker.log file does recive debug and trace info, but I'm not sure what it means.

Code: Select all

[2015-01-09 11:58:04][6547][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6547][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6547][TRACE] cleaning worker
[2015-01-09 11:58:04][6547][TRACE] cleaning client
[2015-01-09 11:58:04][6552][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6552][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6552][TRACE] cleaning worker
[2015-01-09 11:58:04][6552][TRACE] cleaning client
[2015-01-09 11:58:04][6550][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6550][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6550][TRACE] cleaning worker
[2015-01-09 11:58:04][6550][TRACE] cleaning client
[2015-01-09 11:58:04][6548][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6548][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6548][TRACE] cleaning worker
[2015-01-09 11:58:04][6548][TRACE] cleaning client
[2015-01-09 11:58:04][6551][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6551][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6551][TRACE] cleaning worker
[2015-01-09 11:58:04][6551][TRACE] cleaning client
[2015-01-09 11:58:04][6549][TRACE] idle_sighandler(14)
[2015-01-09 11:58:04][6549][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:04][6549][TRACE] cleaning worker
[2015-01-09 11:58:04][6549][TRACE] cleaning client
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:05][5748][TRACE] make_new_child(2)
[2015-01-09 11:58:05][5748][TRACE] forking status worker
[2015-01-09 11:58:05][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:05][5748][TRACE] forking worker
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 11:58:05][6554][DEBUG] child started with pid: 6554
[2015-01-09 11:58:05][6554][TRACE] status worker client started
[2015-01-09 11:58:05][6554][TRACE] set_worker()
[2015-01-09 11:58:05][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:05][5748][TRACE] forking worker
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 11:58:05][6554][TRACE] create_client()
[2015-01-09 11:58:05][6554][TRACE] create_client_dup()
[2015-01-09 11:58:05][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:05][5748][TRACE] forking worker
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 11:58:05][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:05][5748][TRACE] forking worker
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 11:58:05][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:05][5748][TRACE] forking worker
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:05][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 11:58:05][6559][DEBUG] child started with pid: 6559
[2015-01-09 11:58:05][6559][TRACE] job worker client started
[2015-01-09 11:58:05][6559][TRACE] set_worker()
[2015-01-09 11:58:05][6559][TRACE] create_client()
[2015-01-09 11:58:05][6559][TRACE] create_client_dup()
[2015-01-09 11:58:05][6557][DEBUG] child started with pid: 6557
[2015-01-09 11:58:05][6557][TRACE] job worker client started
[2015-01-09 11:58:05][6557][TRACE] set_worker()
[2015-01-09 11:58:05][6558][DEBUG] child started with pid: 6558
[2015-01-09 11:58:05][6556][DEBUG] child started with pid: 6556
[2015-01-09 11:58:05][6556][TRACE] job worker client started
[2015-01-09 11:58:05][6556][TRACE] set_worker()
[2015-01-09 11:58:05][6557][TRACE] create_client()
[2015-01-09 11:58:05][6557][TRACE] create_client_dup()
[2015-01-09 11:58:05][6556][TRACE] create_client()
[2015-01-09 11:58:05][6558][TRACE] job worker client started
[2015-01-09 11:58:05][6556][TRACE] create_client_dup()
[2015-01-09 11:58:05][6558][TRACE] set_worker()
[2015-01-09 11:58:05][6558][TRACE] create_client()
[2015-01-09 11:58:05][6558][TRACE] create_client_dup()
[2015-01-09 11:58:05][6555][DEBUG] child started with pid: 6555
[2015-01-09 11:58:05][6555][TRACE] job worker client started
[2015-01-09 11:58:05][6555][TRACE] set_worker()
[2015-01-09 11:58:05][6555][TRACE] create_client()
[2015-01-09 11:58:05][6555][TRACE] create_client_dup()
[2015-01-09 11:58:35][6556][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6556][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6556][TRACE] cleaning worker
[2015-01-09 11:58:35][6556][TRACE] cleaning client
[2015-01-09 11:58:35][6554][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6554][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6554][TRACE] cleaning worker
[2015-01-09 11:58:35][6554][TRACE] cleaning client
[2015-01-09 11:58:35][6555][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6555][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6555][TRACE] cleaning worker
[2015-01-09 11:58:35][6555][TRACE] cleaning client
[2015-01-09 11:58:35][6558][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6558][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6558][TRACE] cleaning worker
[2015-01-09 11:58:35][6558][TRACE] cleaning client
[2015-01-09 11:58:35][6559][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6559][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6559][TRACE] cleaning worker
[2015-01-09 11:58:35][6559][TRACE] cleaning client
[2015-01-09 11:58:35][6557][TRACE] idle_sighandler(14)
[2015-01-09 11:58:35][6557][TRACE] clean_worker_exit(0)
[2015-01-09 11:58:35][6557][TRACE] cleaning worker
[2015-01-09 11:58:35][6557][TRACE] cleaning client
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:58:36][5748][TRACE] make_new_child(2)
[2015-01-09 11:58:36][5748][TRACE] forking status worker
[2015-01-09 11:58:36][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:36][5748][TRACE] forking worker
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 11:58:36][6621][DEBUG] child started with pid: 6621
[2015-01-09 11:58:36][6621][TRACE] status worker client started
[2015-01-09 11:58:36][6621][TRACE] set_worker()
[2015-01-09 11:58:36][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:36][5748][TRACE] forking worker
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 11:58:36][6621][TRACE] create_client()
[2015-01-09 11:58:36][6621][TRACE] create_client_dup()
[2015-01-09 11:58:36][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:36][5748][TRACE] forking worker
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 11:58:36][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:36][5748][TRACE] forking worker
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 11:58:36][5748][TRACE] make_new_child(0)
[2015-01-09 11:58:36][5748][TRACE] forking worker
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index()
[2015-01-09 11:58:36][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 11:58:36][6626][DEBUG] child started with pid: 6626
[2015-01-09 11:58:36][6626][TRACE] job worker client started
[2015-01-09 11:58:36][6626][TRACE] set_worker()
[2015-01-09 11:58:36][6626][TRACE] create_client()
[2015-01-09 11:58:36][6626][TRACE] create_client_dup()
[2015-01-09 11:58:36][6624][DEBUG] child started with pid: 6624
[2015-01-09 11:58:36][6624][TRACE] job worker client started
[2015-01-09 11:58:36][6624][TRACE] set_worker()
[2015-01-09 11:58:36][6624][TRACE] create_client()
[2015-01-09 11:58:36][6624][TRACE] create_client_dup()
[2015-01-09 11:58:36][6625][DEBUG] child started with pid: 6625
[2015-01-09 11:58:36][6625][TRACE] job worker client started
[2015-01-09 11:58:36][6625][TRACE] set_worker()
[2015-01-09 11:58:36][6625][TRACE] create_client()
[2015-01-09 11:58:36][6625][TRACE] create_client_dup()
[2015-01-09 11:58:36][6623][DEBUG] child started with pid: 6623
[2015-01-09 11:58:36][6623][TRACE] job worker client started
[2015-01-09 11:58:36][6623][TRACE] set_worker()
[2015-01-09 11:58:36][6623][TRACE] create_client()
[2015-01-09 11:58:36][6623][TRACE] create_client_dup()
[2015-01-09 11:58:36][6622][DEBUG] child started with pid: 6622
[2015-01-09 11:58:36][6622][TRACE] job worker client started
[2015-01-09 11:58:36][6622][TRACE] set_worker()
[2015-01-09 11:58:36][6622][TRACE] create_client()
[2015-01-09 11:58:36][6622][TRACE] create_client_dup()
[2015-01-09 11:59:06][6626][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6626][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6626][TRACE] cleaning worker
[2015-01-09 11:59:06][6626][TRACE] cleaning client
[2015-01-09 11:59:06][6624][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6624][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6624][TRACE] cleaning worker
[2015-01-09 11:59:06][6624][TRACE] cleaning client
[2015-01-09 11:59:06][6622][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6622][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6622][TRACE] cleaning worker
[2015-01-09 11:59:06][6622][TRACE] cleaning client
[2015-01-09 11:59:06][6625][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6625][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6625][TRACE] cleaning worker
[2015-01-09 11:59:06][6625][TRACE] cleaning client
[2015-01-09 11:59:06][6621][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6621][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6621][TRACE] cleaning worker
[2015-01-09 11:59:06][6621][TRACE] cleaning client
[2015-01-09 11:59:06][6623][TRACE] idle_sighandler(14)
[2015-01-09 11:59:06][6623][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:06][6623][TRACE] cleaning worker
[2015-01-09 11:59:06][6623][TRACE] cleaning client
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:07][5748][TRACE] make_new_child(2)
[2015-01-09 11:59:07][5748][TRACE] forking status worker
[2015-01-09 11:59:07][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:07][5748][TRACE] forking worker
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 11:59:07][6660][DEBUG] child started with pid: 6660
[2015-01-09 11:59:07][6660][TRACE] status worker client started
[2015-01-09 11:59:07][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:07][5748][TRACE] forking worker
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 11:59:07][6660][TRACE] set_worker()
[2015-01-09 11:59:07][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:07][5748][TRACE] forking worker
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 11:59:07][6660][TRACE] create_client()
[2015-01-09 11:59:07][6660][TRACE] create_client_dup()
[2015-01-09 11:59:07][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:07][5748][TRACE] forking worker
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 11:59:07][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:07][5748][TRACE] forking worker
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:07][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 11:59:07][6665][DEBUG] child started with pid: 6665
[2015-01-09 11:59:07][6665][TRACE] job worker client started
[2015-01-09 11:59:07][6665][TRACE] set_worker()
[2015-01-09 11:59:07][6665][TRACE] create_client()
[2015-01-09 11:59:07][6665][TRACE] create_client_dup()
[2015-01-09 11:59:07][6663][DEBUG] child started with pid: 6663
[2015-01-09 11:59:07][6663][TRACE] job worker client started
[2015-01-09 11:59:07][6663][TRACE] set_worker()
[2015-01-09 11:59:07][6663][TRACE] create_client()
[2015-01-09 11:59:07][6663][TRACE] create_client_dup()
[2015-01-09 11:59:07][6664][DEBUG] child started with pid: 6664
[2015-01-09 11:59:07][6664][TRACE] job worker client started
[2015-01-09 11:59:07][6664][TRACE] set_worker()
[2015-01-09 11:59:07][6664][TRACE] create_client()
[2015-01-09 11:59:07][6664][TRACE] create_client_dup()
[2015-01-09 11:59:07][6662][DEBUG] child started with pid: 6662
[2015-01-09 11:59:07][6662][TRACE] job worker client started
[2015-01-09 11:59:07][6661][DEBUG] child started with pid: 6661
[2015-01-09 11:59:07][6662][TRACE] set_worker()
[2015-01-09 11:59:07][6661][TRACE] job worker client started
[2015-01-09 11:59:07][6662][TRACE] create_client()
[2015-01-09 11:59:07][6661][TRACE] set_worker()
[2015-01-09 11:59:07][6662][TRACE] create_client_dup()
[2015-01-09 11:59:07][6661][TRACE] create_client()
[2015-01-09 11:59:07][6661][TRACE] create_client_dup()
[2015-01-09 11:59:37][6663][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6663][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6663][TRACE] cleaning worker
[2015-01-09 11:59:37][6663][TRACE] cleaning client
[2015-01-09 11:59:37][6665][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6665][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6665][TRACE] cleaning worker
[2015-01-09 11:59:37][6665][TRACE] cleaning client
[2015-01-09 11:59:37][6660][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6660][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6660][TRACE] cleaning worker
[2015-01-09 11:59:37][6660][TRACE] cleaning client
[2015-01-09 11:59:37][6661][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6661][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6661][TRACE] cleaning worker
[2015-01-09 11:59:37][6661][TRACE] cleaning client
[2015-01-09 11:59:37][6664][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6664][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6664][TRACE] cleaning worker
[2015-01-09 11:59:37][6664][TRACE] cleaning client
[2015-01-09 11:59:37][6662][TRACE] idle_sighandler(14)
[2015-01-09 11:59:37][6662][TRACE] clean_worker_exit(0)
[2015-01-09 11:59:37][6662][TRACE] cleaning worker
[2015-01-09 11:59:37][6662][TRACE] cleaning client
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 11:59:38][5748][TRACE] make_new_child(2)
[2015-01-09 11:59:38][5748][TRACE] forking status worker
[2015-01-09 11:59:38][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:38][5748][TRACE] forking worker
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 11:59:38][6723][DEBUG] child started with pid: 6723
[2015-01-09 11:59:38][6723][TRACE] status worker client started
[2015-01-09 11:59:38][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:38][6723][TRACE] set_worker()
[2015-01-09 11:59:38][5748][TRACE] forking worker
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 11:59:38][6723][TRACE] create_client()
[2015-01-09 11:59:38][6723][TRACE] create_client_dup()
[2015-01-09 11:59:38][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:38][5748][TRACE] forking worker
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 11:59:38][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:38][5748][TRACE] forking worker
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 11:59:38][5748][TRACE] make_new_child(0)
[2015-01-09 11:59:38][5748][TRACE] forking worker
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index()
[2015-01-09 11:59:38][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 11:59:38][6727][DEBUG] child started with pid: 6727
[2015-01-09 11:59:38][6727][TRACE] job worker client started
[2015-01-09 11:59:38][6727][TRACE] set_worker()
[2015-01-09 11:59:38][6727][TRACE] create_client()
[2015-01-09 11:59:38][6727][TRACE] create_client_dup()
[2015-01-09 11:59:38][6725][DEBUG] child started with pid: 6725
[2015-01-09 11:59:38][6725][TRACE] job worker client started
[2015-01-09 11:59:38][6725][TRACE] set_worker()
[2015-01-09 11:59:38][6725][TRACE] create_client()
[2015-01-09 11:59:38][6724][DEBUG] child started with pid: 6724
[2015-01-09 11:59:38][6724][TRACE] job worker client started
[2015-01-09 11:59:38][6724][TRACE] set_worker()
[2015-01-09 11:59:38][6724][TRACE] create_client()
[2015-01-09 11:59:38][6724][TRACE] create_client_dup()
[2015-01-09 11:59:38][6725][TRACE] create_client_dup()
[2015-01-09 11:59:38][6728][DEBUG] child started with pid: 6728
[2015-01-09 11:59:38][6728][TRACE] job worker client started
[2015-01-09 11:59:38][6728][TRACE] set_worker()
[2015-01-09 11:59:38][6728][TRACE] create_client()
[2015-01-09 11:59:38][6728][TRACE] create_client_dup()
[2015-01-09 11:59:38][6726][DEBUG] child started with pid: 6726
[2015-01-09 11:59:38][6726][TRACE] job worker client started
[2015-01-09 11:59:38][6726][TRACE] set_worker()
[2015-01-09 11:59:38][6726][TRACE] create_client()
[2015-01-09 11:59:38][6726][TRACE] create_client_dup()
[2015-01-09 12:00:08][6725][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6725][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6725][TRACE] cleaning worker
[2015-01-09 12:00:08][6725][TRACE] cleaning client
[2015-01-09 12:00:08][6727][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6727][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6727][TRACE] cleaning worker
[2015-01-09 12:00:08][6727][TRACE] cleaning client
[2015-01-09 12:00:08][6724][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6724][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6724][TRACE] cleaning worker
[2015-01-09 12:00:08][6724][TRACE] cleaning client
[2015-01-09 12:00:08][6723][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6723][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6723][TRACE] cleaning worker
[2015-01-09 12:00:08][6723][TRACE] cleaning client
[2015-01-09 12:00:08][6728][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6728][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6728][TRACE] cleaning worker
[2015-01-09 12:00:08][6728][TRACE] cleaning client
[2015-01-09 12:00:08][6726][TRACE] idle_sighandler(14)
[2015-01-09 12:00:08][6726][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:08][6726][TRACE] cleaning worker
[2015-01-09 12:00:08][6726][TRACE] cleaning client
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:09][5748][TRACE] make_new_child(2)
[2015-01-09 12:00:09][5748][TRACE] forking status worker
[2015-01-09 12:00:09][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:09][5748][TRACE] forking worker
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 12:00:09][6746][DEBUG] child started with pid: 6746
[2015-01-09 12:00:09][6746][TRACE] status worker client started
[2015-01-09 12:00:09][6746][TRACE] set_worker()
[2015-01-09 12:00:09][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:09][5748][TRACE] forking worker
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 12:00:09][6746][TRACE] create_client()
[2015-01-09 12:00:09][6746][TRACE] create_client_dup()
[2015-01-09 12:00:09][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:09][5748][TRACE] forking worker
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 12:00:09][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:09][5748][TRACE] forking worker
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 12:00:09][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:09][5748][TRACE] forking worker
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:09][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 12:00:09][6750][DEBUG] child started with pid: 6750
[2015-01-09 12:00:09][6750][TRACE] job worker client started
[2015-01-09 12:00:09][6750][TRACE] set_worker()
[2015-01-09 12:00:09][6750][TRACE] create_client()
[2015-01-09 12:00:09][6750][TRACE] create_client_dup()
[2015-01-09 12:00:09][6749][DEBUG] child started with pid: 6749
[2015-01-09 12:00:09][6749][TRACE] job worker client started
[2015-01-09 12:00:09][6749][TRACE] set_worker()
[2015-01-09 12:00:09][6749][TRACE] create_client()
[2015-01-09 12:00:09][6749][TRACE] create_client_dup()
[2015-01-09 12:00:09][6751][DEBUG] child started with pid: 6751
[2015-01-09 12:00:09][6751][TRACE] job worker client started
[2015-01-09 12:00:09][6751][TRACE] set_worker()
[2015-01-09 12:00:09][6751][TRACE] create_client()
[2015-01-09 12:00:09][6751][TRACE] create_client_dup()
[2015-01-09 12:00:09][6747][DEBUG] child started with pid: 6747
[2015-01-09 12:00:09][6747][TRACE] job worker client started
[2015-01-09 12:00:09][6747][TRACE] set_worker()
[2015-01-09 12:00:09][6747][TRACE] create_client()
[2015-01-09 12:00:09][6747][TRACE] create_client_dup()
[2015-01-09 12:00:09][6748][DEBUG] child started with pid: 6748
[2015-01-09 12:00:09][6748][TRACE] job worker client started
[2015-01-09 12:00:09][6748][TRACE] set_worker()
[2015-01-09 12:00:09][6748][TRACE] create_client()
[2015-01-09 12:00:09][6748][TRACE] create_client_dup()
[2015-01-09 12:00:39][6750][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6750][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6750][TRACE] cleaning worker
[2015-01-09 12:00:39][6750][TRACE] cleaning client
[2015-01-09 12:00:39][6749][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6749][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6749][TRACE] cleaning worker
[2015-01-09 12:00:39][6749][TRACE] cleaning client
[2015-01-09 12:00:39][6747][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6747][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6747][TRACE] cleaning worker
[2015-01-09 12:00:39][6747][TRACE] cleaning client
[2015-01-09 12:00:39][6746][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6746][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6746][TRACE] cleaning worker
[2015-01-09 12:00:39][6746][TRACE] cleaning client
[2015-01-09 12:00:39][6751][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6751][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6751][TRACE] cleaning worker
[2015-01-09 12:00:39][6751][TRACE] cleaning client
[2015-01-09 12:00:39][6748][TRACE] idle_sighandler(14)
[2015-01-09 12:00:39][6748][TRACE] clean_worker_exit(0)
[2015-01-09 12:00:39][6748][TRACE] cleaning worker
[2015-01-09 12:00:39][6748][TRACE] cleaning client
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:00:40][5748][TRACE] make_new_child(2)
[2015-01-09 12:00:40][5748][TRACE] forking status worker
[2015-01-09 12:00:40][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:40][5748][TRACE] forking worker
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 12:00:40][6758][DEBUG] child started with pid: 6758
[2015-01-09 12:00:40][6758][TRACE] status worker client started
[2015-01-09 12:00:40][6758][TRACE] set_worker()
[2015-01-09 12:00:40][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:40][5748][TRACE] forking worker
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 12:00:40][6758][TRACE] create_client()
[2015-01-09 12:00:40][6758][TRACE] create_client_dup()
[2015-01-09 12:00:40][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:40][5748][TRACE] forking worker
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 12:00:40][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:40][5748][TRACE] forking worker
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 12:00:40][5748][TRACE] make_new_child(0)
[2015-01-09 12:00:40][5748][TRACE] forking worker
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index()
[2015-01-09 12:00:40][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 12:00:40][6763][DEBUG] child started with pid: 6763
[2015-01-09 12:00:40][6763][TRACE] job worker client started
[2015-01-09 12:00:40][6763][TRACE] set_worker()
[2015-01-09 12:00:40][6763][TRACE] create_client()
[2015-01-09 12:00:40][6763][TRACE] create_client_dup()
[2015-01-09 12:00:40][6762][DEBUG] child started with pid: 6762
[2015-01-09 12:00:40][6762][TRACE] job worker client started
[2015-01-09 12:00:40][6762][TRACE] set_worker()
[2015-01-09 12:00:40][6762][TRACE] create_client()
[2015-01-09 12:00:40][6762][TRACE] create_client_dup()
[2015-01-09 12:00:40][6761][DEBUG] child started with pid: 6761
[2015-01-09 12:00:40][6761][TRACE] job worker client started
[2015-01-09 12:00:40][6761][TRACE] set_worker()
[2015-01-09 12:00:40][6761][TRACE] create_client()
[2015-01-09 12:00:40][6761][TRACE] create_client_dup()
[2015-01-09 12:00:40][6760][DEBUG] child started with pid: 6760
[2015-01-09 12:00:40][6760][TRACE] job worker client started
[2015-01-09 12:00:40][6760][TRACE] set_worker()
[2015-01-09 12:00:40][6760][TRACE] create_client()
[2015-01-09 12:00:40][6760][TRACE] create_client_dup()
[2015-01-09 12:00:40][6759][DEBUG] child started with pid: 6759
[2015-01-09 12:00:40][6759][TRACE] job worker client started
[2015-01-09 12:00:40][6759][TRACE] set_worker()
[2015-01-09 12:00:40][6759][TRACE] create_client()
[2015-01-09 12:00:40][6759][TRACE] create_client_dup()
[2015-01-09 12:01:10][6758][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6758][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6758][TRACE] cleaning worker
[2015-01-09 12:01:10][6758][TRACE] cleaning client
[2015-01-09 12:01:10][6762][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6762][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6762][TRACE] cleaning worker
[2015-01-09 12:01:10][6762][TRACE] cleaning client
[2015-01-09 12:01:10][6763][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6763][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6763][TRACE] cleaning worker
[2015-01-09 12:01:10][6763][TRACE] cleaning client
[2015-01-09 12:01:10][6761][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6761][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6761][TRACE] cleaning worker
[2015-01-09 12:01:10][6761][TRACE] cleaning client
[2015-01-09 12:01:10][6759][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6759][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6759][TRACE] cleaning worker
[2015-01-09 12:01:10][6759][TRACE] cleaning client
[2015-01-09 12:01:10][6760][TRACE] idle_sighandler(14)
[2015-01-09 12:01:10][6760][TRACE] clean_worker_exit(0)
[2015-01-09 12:01:10][6760][TRACE] cleaning worker
[2015-01-09 12:01:10][6760][TRACE] cleaning client
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] waitpid() worker exited with: 0
[2015-01-09 12:01:11][5748][TRACE] make_new_child(2)
[2015-01-09 12:01:11][5748][TRACE] forking status worker
[2015-01-09 12:01:11][5748][TRACE] make_new_child(0)
[2015-01-09 12:01:11][5748][TRACE] forking worker
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index()
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index() -> 4
[2015-01-09 12:01:11][5748][TRACE] make_new_child(0)
[2015-01-09 12:01:11][5748][TRACE] forking worker
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index()
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index() -> 5
[2015-01-09 12:01:11][6793][DEBUG] child started with pid: 6793
[2015-01-09 12:01:11][6793][TRACE] status worker client started
[2015-01-09 12:01:11][6793][TRACE] set_worker()
[2015-01-09 12:01:11][5748][TRACE] make_new_child(0)
[2015-01-09 12:01:11][5748][TRACE] forking worker
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index()
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index() -> 6
[2015-01-09 12:01:11][6793][TRACE] create_client()
[2015-01-09 12:01:11][6793][TRACE] create_client_dup()
[2015-01-09 12:01:11][5748][TRACE] make_new_child(0)
[2015-01-09 12:01:11][5748][TRACE] forking worker
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index()
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index() -> 7
[2015-01-09 12:01:11][5748][TRACE] make_new_child(0)
[2015-01-09 12:01:11][5748][TRACE] forking worker
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index()
[2015-01-09 12:01:11][5748][TRACE] get_next_shm_index() -> 8
[2015-01-09 12:01:11][6798][DEBUG] child started with pid: 6798
[2015-01-09 12:01:11][6798][TRACE] job worker client started
[2015-01-09 12:01:11][6798][TRACE] set_worker()
[2015-01-09 12:01:11][6798][TRACE] create_client()
[2015-01-09 12:01:11][6798][TRACE] create_client_dup()
[2015-01-09 12:01:11][6796][DEBUG] child started with pid: 6796
[2015-01-09 12:01:11][6796][TRACE] job worker client started
[2015-01-09 12:01:11][6796][TRACE] set_worker()
[2015-01-09 12:01:11][6796][TRACE] create_client()
[2015-01-09 12:01:11][6796][TRACE] create_client_dup()
[2015-01-09 12:01:11][6797][DEBUG] child started with pid: 6797
[2015-01-09 12:01:11][6797][TRACE] job worker client started
[2015-01-09 12:01:11][6797][TRACE] set_worker()
[2015-01-09 12:01:11][6797][TRACE] create_client()
[2015-01-09 12:01:11][6797][TRACE] create_client_dup()
[2015-01-09 12:01:11][6794][DEBUG] child started with pid: 6794
[2015-01-09 12:01:11][6794][TRACE] job worker client started
[2015-01-09 12:01:11][6794][TRACE] set_worker()
[2015-01-09 12:01:11][6794][TRACE] create_client()
[2015-01-09 12:01:11][6794][TRACE] create_client_dup()
[2015-01-09 12:01:11][6795][DEBUG] child started with pid: 6795
[2015-01-09 12:01:11][6795][TRACE] job worker client started
[2015-01-09 12:01:11][6795][TRACE] set_worker()
[2015-01-09 12:01:11][6795][TRACE] create_client()
[2015-01-09 12:01:11][6795][TRACE] create_client_dup()
What ahppens if in the nsclient.ini file , the mrtg ip is not set in allowed_from? WIll the check fail, or wil it be executed on the Nagios XI production server?
Thanks for any help / advice on this.. Reread http://assets.nagios.com/downloads/nagi ... ios_XI.pdf 5 times, config is exact as in the procedure. Shared key is the same. When I remove the ip of the Nagios XI prod server from allowed_from on a server in the all_srv_gearman test hostgroup, I get "CHECK_NRPE: Error - Could not complete SSL handshake.", so checks are still executed locally... :(

Grtz

Willem
Last edited by WillemDH on Fri Jan 09, 2015 11:54 am, edited 1 time in total.
Nagios XI 5.8.1
https://outsideit.net
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Mod Gearman Installation Issue

Post by sreinhardt »

At least from what you provided there, it looks like no jobs are running through gearman at the moment. Couple of things:

Is the gearmand daemon local to the nagios server?
Are you running a local worker to the nagios server, if so, are you seeing jobs there?
Can you post the full, minus sensitive info, /etc/gearmand/gearmand.conf, /etc/mod_gearman/mod_gearman_worker.conf, /etc/mod_gearman/mod_gearman_neb.conf, and finally the current /usr/local/nagios/etc/nagios.cfg files please?

I have to imagine that either the neb module is not connecting to the gearman server, and so no checks are routed through it, or that somewhere between the gearmand daemon and your remote workers it is breaking. The logs you show from debug definitely claim that the worker servers are working and waiting for work, but don't seem to be getting any.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Mod Gearman Installation Issue

Post by WillemDH »

Spenser,

I can give you /etc/mod_gearman/mod_gearman_worker.conf, /etc/mod_gearman/mod_gearman_neb.conf and /usr/local/nagios/etc/nagios.cfg but it seems /etc/gearmand/gearmand.conf does not exist. I guess it is supposed to exist? That might be the reason it doesn't work..

/etc/mod_gearman/mod_gearman_worker.conf

Code: Select all

###############################################################################
#
#  Mod-Gearman - distribute checks with gearman
#
#  Copyright (c) 2010 Sven Nierlein
#
#  Worker Module Config
#
###############################################################################

# Identifier, hostname will be used if undefined
#identifier=hostname

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=0

# Path to the logfile.
logfile=/var/log/mod_gearman/mod_gearman_worker.log

# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
server=localhost:4730


# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>


# defines if the worker should execute eventhandlers.
eventhandler=yes


# defines if the worker should execute
# service checks.
services=yes


# defines if the worker should execute
# host checks.
hosts=yes


# sets a list of hostgroups which this worker will work
# on. Either specify a comma seperated list or use
# multiple lines.
#hostgroups=name1
#hostgroups=name2,name3


# sets a list of servicegroups which this worker will
# work on.
#servicegroups=name1,name2,name3

# enables or disables encryption. It is strongly
# advised to not disable encryption. Anybody will be
# able to inject packages to your worker.
# Encryption is enabled by default and you have to
# explicitly disable it.
# When using encryption, you will either have to
# specify a shared password with key=... or a
# keyfile with keyfile=...
# Default is On.
encryption=yes


# A shared password which will be used for
# encryption of data pakets. Should be at least 8
# bytes long. Maximum length is 32 characters.
key=pjJeLzIJkFgRyUNxgoefefe


# The shared password will be read from this file.
# Use either key or keyfile. Only the first 32
# characters will be used.
#keyfile=/path/to/secret.file

# Path to the pidfile. Usually set by the init script
#pidfile=/var/mod_gearman/mod_gearman_worker.pid

# Default job timeout in seconds. Currently this value is only used for
# eventhandler. The worker will use the values from the core for host and
# service checks.
job_timeout=60

# Minimum number of worker processes which should
# run at any time.
min-worker=5

# Maximum number of worker processes which should
# run at any time. You may set this equal to
# min-worker setting to disable dynamic starting of
# workers. When setting this to 1, all services from
# this worker will be executed one after another.
max-worker=50

# Time after which an idling worker exists
# This parameter controls how fast your waiting workers will
# exit if there are no jobs waiting.
idle-timeout=30

# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000

# max-age is the threshold for discarding too old jobs. When a new job is older
# than this amount of seconds it will not be executed and just discarded. Set to
# zero to disable this check.
#max-age=0

# defines the rate of spawned worker per second as long
# as there are jobs waiting
spawn-rate=1

# Use this option to disable an extra fork for each plugin execution. Disabling
# this option will reduce the load on the worker host but can lead to problems with
# unclean plugin. Default: yes
fork_on_exec=no

# Set a limit based on the 1min load average. When exceding the load limit,
# no new worker will be started until the current load is below the limit.
# No limit will be used when set to 0.
load_limit1=0

# Same as load_limit1 but for the 5min load average.
load_limit5=0

# Same as load_limit1 but for the 15min load average.
load_limit15=0

# Use this option to show stderr output of plugins too.
# Default: yes
show_error_output=yes

# Use dup_results_are_passive to set if the duplicate result send to the dupserver
# will be passive or active.
# Default is yes (passive).
#dup_results_are_passive=yes

# When embedded perl has been compiled in, you can use this
# switch to enable or disable the embedded perl interpreter.
enable_embedded_perl=on

# Default value used when the perl script does not have a
# "nagios: +epn" or "nagios: -epn" set.
# Perl scripts not written for epn support usually fail with epn,
# so its better to set the default to off.
use_embedded_perl_implicitly=off

# Cache compiled perl scripts. This makes the worker process a little
# bit bigger but makes execution of perl scripts even faster.
# When turned off, Mod-Gearman will still use the embedded perl
# interpreter, but will not cache the compiled script.
use_perl_cache=on

# path to p1 file which is used to execute and cache the
# perl scripts run by the embedded perl interpreter
p1_file=/usr/share/mod_gearman/mod_gearman_p1.pl


# Workarounds

# workaround for rc 25 bug
# duplicate jobs from gearmand result in exit code 25 of plugins
# because they are executed twice and get killed because of using
# the same ressource.
# Sending results (when exit code is 25 ) will be skipped with this
# enabled.
workaround_rc_25=off
/etc/mod_gearman/mod_gearman_neb.conf

Code: Select all

###############################################################################
#
#  Mod-Gearman - distribute checks with gearman
#
#  Copyright (c) 2010 Sven Nierlein
#
#  Mod-Gearman NEB Module Config
#
###############################################################################

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=2

# Path to the logfile.
logfile=/var/log/mod_gearman/mod_gearman_neb.log

# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
server=localhost:4730


# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>


# defines if the module should distribute execution of
# eventhandlers.
eventhandler=no


# defines if the module should distribute execution of
# service checks.
services=yes


# defines if the module should distribute execution of
# host checks.
hosts=yes


# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=all_srv_gearman
#hostgroups=name2,name3


# sets a list of servicegroups which will go into seperate
# queues.
#servicegroups=all_gearman_services

# Set this to 'no' if you want Mod-Gearman to only take care of
# servicechecks. No hostchecks will be processed by Mod-Gearman. Use
# this option to disable hostchecks and still have the possibility to
# use hostgroups for easy configuration of your services.
# If set to yes, you still have to define which hostchecks should be
# processed by either using 'hosts' or the 'hostgroups' option.
# Default is Yes.
do_hostchecks=yes

# This settings determines if all eventhandlers go into a single
# 'eventhandlers' queue or into the same queue like normal checks
# would do.
route_eventhandler_like_checks=no

# enables or disables encryption. It is strongly
# advised to not disable encryption. Anybody will be
# able to inject packages to your worker.
# Encryption is enabled by default and you have to
# explicitly disable it.
# When using encryption, you will either have to
# specify a shared password with key=... or a
# keyfile with keyfile=...
# Default is On.
encryption=yes


# A shared password which will be used for
# encryption of data pakets. Should be at least 8
# bytes long. Maximum length is 32 characters.
key=pjJeLzIJkFgRyUNkkdjd


# The shared password will be read from this file.
# Use either key or keyfile. Only the first 32
# characters will be used.
#keyfile=/path/to/secret.file


# use_uniq_jobs
# Using uniq keys prevents the gearman queues from filling up when there
# is no worker. However, gearmand seems to have problems with the uniq
# key and sometimes jobs get stuck in the queue. Set this option to 'off'
# when you run into problems with stuck jobs but make sure your worker
# are running.
use_uniq_jobs=on



###############################################################################
#
# NEB Module Config
#
# the following settings are for the neb module only and
# will be ignored by the worker.
#
###############################################################################

# sets a list of hostgroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localhostgroups=


# sets a list of servicegroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localservicegroups=

# The queue_custom_variable can be used to define the target queue
# by a custom variable in addition to host/servicegroups. When set
# for ex. to 'WORKER' you then could define a '_WORKER' custom
# variable for your hosts and services to directly set the worker
# queue. The host queue is inherited unless overwritten
# by a service custom variable. Set the value of your custom
# variable to 'local' to bypass Mod-Gearman (Same behaviour as in
# localhostgroups/localservicegroups).
#queue_custom_variable=WORKER

# Number of result worker threads. Usually one is
# enough. You may increase the value if your
# result queue is not processed fast enough.
# Default: 1
result_workers=1


# defines if the module should distribute perfdata
# to gearman.
# Note: processing of perfdata is not part of
# mod_gearman. You will need additional worker for
# handling performance data. For example: pnp4nagios
# Performance data is just written to the gearman
# queue.
# Default: no
perfdata=no

# perfdata mode overwrite helps preventing the perdata queue getting to big
# 1 = overwrote
# 2 = append
perfdata_mode=1

# The Mod-Gearman NEB module will submit a fake result for orphaned host
# checks with a message saying there is no worker running for this
# queue. Use this option to get better reporting results, otherwise your
# hosts will keep their last state as long as there is no worker
# running.
# Default: yes
orphan_host_checks=yes

# Same like 'orphan_host_checks' but for services.
# Default: yes
orphan_service_checks=yes

# When accept_clear_results is enabled, the NEB module will accept unencrypted
# results too. This is quite useful if you have lots of passive checks and make
# use of send_gearman/send_multi where you would have to spread the shared key to
# all clients using these tools.
# Default is no.
accept_clear_results=no

/usr/local/nagios/etc/nagios.cfg

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
#command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/usr/local/nagios/var/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=250
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
use_aggressive_host_checking=0
#####use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Mod Gearman Installation Issue

Post by Box293 »

I think the problem lies in your /usr/local/nagios/etc/nagios.cfg

I think the broker module needs to be:

Code: Select all

broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Mod Gearman Installation Issue

Post by WillemDH »

Troy,

Well... I did as you described in your documentation and added

Code: Select all

broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
after this line in nagios .cfg

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Man, Suddenly Nagios exploded. Instead of only distributing checks of host and services in all_srv_gearman hostgroup,

Code: Select all

# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=all_srv_gearman
which contains only 4 hosts, all host and all services seemed to start sending their checks to the remote worker node.. As my nsclient.ini fiels do not contain the remote worker node ip address, they all failed lol.

Removed the line from the nagios.cfg file and restarted Nagios and things calmed down again...

So I suspected this setting would cause this:

Code: Select all

# defines if the module should distribute execution of
# service checks.
services=yes


# defines if the module should distribute execution of
# host checks.
hosts=yes
So I set it to no... I would have suspected the above would only impact host and services of the hostgroups configured...., not ALLL hosts and services... :o After setting these two to no, it seemed to work. Gearman_top now shows jobs waiting and running. The services of the four hosts in the hostgroup are running on the external worker node.

In gearman_top, there is a line
check_results => What exactly woudl this be showing? it seems to always stay on 1 worker available.
hostgroup_all_srv_gearman => The waiting and running jobs are always visible here, what seems to be correct.
worker_fqdn_worker_node => Also stays on 1 worker available all the time. What does this mean? Aren't the jobs running on this remote worker node?

Willem
Nagios XI 5.8.1
https://outsideit.net
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Mod Gearman Installation Issue

Post by sreinhardt »

In gearman_top, there is a line
check_results => What exactly woudl this be showing? it seems to always stay on 1 worker available.
Unless you have an absurdly busy system, this should stay at 1, reaping check results should not take much time.
hostgroup_all_srv_gearman => The waiting and running jobs are always visible here, what seems to be correct.
Yes as it should get the all_srv_gearman hostgroups host and service checks, which likely would be busy based on the name.
worker_fqdn_worker_node => Also stays on 1 worker available all the time. What does this mean? Aren't the jobs running on this remote worker node?
Do you have a hostgroup or service group with hosts and services assigned to it, that matches this name? Off hand, I don't recall this being a standard worker name.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Mod Gearman Installation Issue

Post by WillemDH »

Spenser,

Thanks for the clarification.
Do you have a hostgroup or service group with hosts and services assigned to it, that matches this name? Off hand, I don't recall this being a standard worker name.
No we don't have a hostgroup or servicegroup with that name. It is in fact the fqdn name of our remote worker node (mrtg) withworker_ prefix.

Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Mod Gearman Installation Issue

Post by Box293 »

WillemDH wrote:No we don't have a hostgroup or servicegroup with that name. It is in fact the fqdn name of our remote worker node (mrtg) withworker_ prefix.
When you run gearman_top, worker available means it's ready to run jobs. If it is working quick enough, you may not see the jobs waiting/running columns change.

With gearman_top still running, go to your worker and run:
service mod_gearman_worker stop

You should now see on gearman_top that Worker Available will be 0 and as time goes on the jobs waiting will queue up.

Start the service on the worker and the queue will be processed.

Does this help?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Mod Gearman Installation Issue

Post by WillemDH »

Thanks all for the many answers to my many questions. Imho I had indeed only one problem => The broker was missing from the Nagios.cfg file. But I just double checked and I did not found this in the documentation for integrating mod gearman with Nagios XI. It seems like a crucial part to get it working, so I'm not sure why it is not in the offical documentation? http://assets.nagios.com/downloads/nagi ... ios_XI.pdf

(broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf)

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Mod Gearman Installation Issue

Post by lmiltchev »

You are correct. We will be adding a few extra commands to our documentation.

Adding the broker line was done automatically in the past as we used a script to install Mod Gearman. We switched to using RPMs but the documentation hasn't been updated.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked