Ping service queue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CIBTtechops
Posts: 13
Joined: Tue Dec 20, 2011 3:33 pm

Ping service queue

Post by CIBTtechops »

Hello,

I have many cisco switches that I monitor with Nagios XI.
This should be a simple ping check. For two of them I am constantly, however, getting the following critical alert, even though the switch is pingable from the nagios server:

***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: Ping
Host: XXX (omitted for privacy)
Address: 192.168.XXX.XXX (changed for privacy)
State: CRITICAL
Info:
(service check orphaned, is the mod-gearman worker on queue service running?)
Date/Time: 2015-07-20 04:13:43

Any ideas how to solve this problem?
Thanks.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Ping service queue

Post by lmiltchev »

It seems like the issue is with mod gearman. Have you tried restarting the worker and the mod gearman deamon?

If restarting the gearmand and the worker is not solving the issue, let us know what is the version of the Nagios XI and Mod Gearman that you are currently using, post the nagios.cfg, worker and neb config.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Ping service queue

Post by jolson »

Following what lmiltchev said, it's important that we know your Nagios XI average server load.

You should be able to run the following command and return the output to us:

Code: Select all

sar
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CIBTtechops
Posts: 13
Joined: Tue Dec 20, 2011 3:33 pm

Re: Ping service queue

Post by CIBTtechops »

jolson wrote:Following what lmiltchev said, it's important that we know your Nagios XI average server load.

You should be able to run the following command and return the output to us:

Code: Select all

sar

Code: Select all

[root@colnagios ~]# sar
Linux 2.6.32-279.2.1.el6.i686 (colnagios.cibt.local)    07/20/2015      _i686_  (4 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM     all      5.79      0.00      6.87      0.11      0.05     87.17
12:20:01 AM     all      5.82      0.00      6.77      0.01      0.05     87.34
12:30:01 AM     all      5.95      0.00      6.79      0.01      0.05     87.20
12:40:01 AM     all      5.70      0.00      6.72      0.08      0.05     87.46
12:50:01 AM     all      5.95      0.00      6.77      0.01      0.05     87.21
01:00:01 AM     all      5.99      0.00      6.83      0.01      0.06     87.12
01:10:01 AM     all      5.94      0.00      6.90      0.10      0.05     87.01
01:20:01 AM     all      5.95      0.00      6.91      0.08      0.05     87.01
01:30:01 AM     all      5.91      0.00      6.83      0.02      0.05     87.20
01:40:01 AM     all      6.00      0.00      6.89      0.17      0.05     86.89
01:50:01 AM     all      6.01      0.00      6.90      0.05      0.05     86.99
02:00:01 AM     all      5.48      0.00      6.95      0.03      0.05     87.50
02:10:01 AM     all      4.92      0.00      6.82      0.25      0.06     87.96
02:20:01 AM     all      4.87      0.00      6.70      0.01      0.05     88.37
02:30:01 AM     all      4.86      0.00      6.63      0.02      0.04     88.45
02:40:01 AM     all      4.81      0.00      6.62      0.01      0.05     88.50
02:50:01 AM     all      5.02      0.00      6.78      0.01      0.05     88.15
03:00:01 AM     all      4.95      0.00      6.70      0.01      0.05     88.28
03:10:01 AM     all      4.96      0.00      6.79      0.02      0.05     88.18
03:20:01 AM     all      5.13      0.00      6.92      0.12      0.05     87.77
03:30:01 AM     all      4.89      0.00      6.65      0.01      0.05     88.41
03:40:01 AM     all      4.90      0.01      6.73      0.02      0.05     88.29
03:50:01 AM     all      5.06      0.00      6.74      0.01      0.05     88.14
04:00:01 AM     all      4.94      0.00      6.64      0.01      0.05     88.35
04:10:01 AM     all      4.89      0.00      6.67      0.01      0.05     88.38
04:20:01 AM     all      4.87      0.00      6.52      0.12      0.05     88.43
04:30:01 AM     all      4.86      0.00      6.73      0.14      0.05     88.22
04:40:01 AM     all      4.85      0.00      6.71      0.05      0.05     88.33
04:50:01 AM     all      5.25      0.00      6.83      0.02      0.05     87.86
05:00:01 AM     all      8.50      0.00      7.09      0.05      0.06     84.30
05:10:01 AM     all      8.29      0.00      7.18      0.03      0.06     84.44
05:20:01 AM     all      9.06      0.00      7.39      0.04      0.09     83.42
05:30:01 AM     all      8.81      0.00      7.55      0.19      0.06     83.40
05:40:01 AM     all      7.60      0.00      7.47      0.07      0.08     84.78
05:50:01 AM     all      7.61      0.00      7.34      0.01      0.06     84.98
06:00:01 AM     all      5.62      0.00      6.70      0.01      0.05     87.62
06:10:01 AM     all      8.25      0.00      7.15      0.48      0.06     84.07
06:20:01 AM     all      8.46      0.00      7.16      0.01      0.05     84.31

06:20:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
06:30:01 AM     all      8.36      0.00      7.15      0.02      0.06     84.42
06:40:01 AM     all      8.33      0.00      7.39      0.05      0.06     84.18
06:50:01 AM     all      9.28      0.00      7.97      0.01      0.06     82.68
07:00:01 AM     all      8.48      0.00      7.44      0.02      0.06     83.99
07:10:01 AM     all      9.91      0.00      8.11      0.65      0.06     81.26
07:20:01 AM     all      8.31      0.00      7.64      0.01      0.06     83.98
07:30:01 AM     all      8.30      0.00      7.57      0.01      0.06     84.06
07:40:01 AM     all      8.28      0.00      7.68      0.15      0.05     83.84
07:50:01 AM     all      9.89      0.00      8.24      0.04      0.07     81.77
08:00:01 AM     all      8.57      0.00      7.57      0.03      0.05     83.77
08:10:01 AM     all      7.74      0.00      7.44      0.04      0.05     84.73
08:20:01 AM     all      6.46      0.00      7.28      0.02      0.05     86.19
08:30:01 AM     all      6.32      0.00      7.11      0.02      0.05     86.50
08:40:01 AM     all      6.45      0.00      7.22      0.27      0.06     86.00
08:50:01 AM     all      7.41      0.00      7.67      0.03      0.06     84.84
09:00:01 AM     all      8.78      0.00      7.95      0.08      0.07     83.12
09:10:01 AM     all      7.95      0.00      7.50      0.04      0.06     84.46
09:20:01 AM     all     11.73      0.00      8.20      0.02      0.09     79.96
09:30:01 AM     all      9.81      0.00      7.61      0.02      0.06     82.51
09:40:01 AM     all     12.41      0.00      8.18      0.02      0.06     79.33
09:50:01 AM     all     14.36      0.00      8.88      0.35      0.07     76.35
10:00:01 AM     all     13.88      0.00      8.87      0.01      0.09     77.15
10:10:01 AM     all     11.89      0.00      8.21      0.03      0.06     79.82
10:20:01 AM     all     10.81      0.00      7.78      0.03      0.06     81.31
10:30:01 AM     all      9.94      0.00      7.78      0.01      0.06     82.21
10:40:01 AM     all      8.99      0.00      7.62      0.08      0.09     83.21
10:50:01 AM     all      9.80      0.00      7.78      0.05      0.09     82.28
11:00:01 AM     all      8.93      0.00      7.54      0.01      0.09     83.43
11:10:01 AM     all     10.17      0.00      7.59      0.03      0.08     82.13
11:20:01 AM     all      9.76      0.00      7.44      0.02      0.06     82.72
11:30:01 AM     all      7.44      0.00      7.17      0.20      0.05     85.13
11:40:01 AM     all      7.04      0.00      7.06      0.03      0.05     85.82
11:50:01 AM     all      7.06      0.00      7.08      0.02      0.06     85.78
12:00:01 PM     all      7.06      0.00      7.01      0.24      0.06     85.63
12:10:01 PM     all      7.38      0.00      7.11      0.02      0.06     85.43
12:20:01 PM     all      6.97      0.00      6.77      0.02      0.05     86.19
12:30:01 PM     all      7.19      0.00      7.02      0.01      0.05     85.73
12:40:01 PM     all      6.97      0.00      6.96      0.03      0.05     85.99

12:40:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:50:01 PM     all      7.00      0.00      6.95      0.03      0.05     85.97
01:00:01 PM     all      7.13      0.00      7.08      0.10      0.06     85.64
01:10:01 PM     all      9.90      0.00      7.67      0.04      0.06     82.32
Average:        all      7.46      0.00      7.24      0.07      0.06     85.18
You have new mail in /var/spool/mail/root
[root@colnagios ~]#
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Ping service queue

Post by jolson »

CIBTtechops,

Please run through the troubleshooting steps provided by lmiltchev as well.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CIBTtechops
Posts: 13
Joined: Tue Dec 20, 2011 3:33 pm

Re: Ping service queue

Post by CIBTtechops »

lmiltchev wrote:It seems like the issue is with mod gearman. Have you tried restarting the worker and the mod gearman deamon?

If restarting the gearmand and the worker is not solving the issue, let us know what is the version of the Nagios XI and Mod Gearman that you are currently using, post the nagios.cfg, worker and neb config.
Hello, sorry for the delayed reply, I am in the Central European time zone. My supervisor is the one who manages the system for the most part and I am slowly but surely getting involved, but he assured me that he restarts the gearman and mod-gearman alot! By this, I think he means it breaks a lot, too. For my learning benefit, what er these and what is the difference between the two?

Installed Version of Nagios XI: 2012R2.3

Regarding those other files:

1) I don't know what a worker or neb config are
2) I don't know where to find those files, the neb.cfg does not seem to be here:

Code: Select all

broker_module=/usr/lib/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gea rman_neb.conf
But here is the nagios.cfg file at least:

Code: Select all

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2015.07.21 08:05:45 =~=~=~=~=~=~=~=~=~=~=~=
[root@colnagios ~]# less /usr/local/nagios/etc/nagios.cfg
[?1049h[?1h=
# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/n domod.cfg
broker_module=/usr/lib/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gea rman_neb.conf


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP [7m/usr/local/nagios/etc/nagios.cfg [27m[K
[KE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTAT ETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$H OSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHO STSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


:[K
[K# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
:[K
[Kglobal_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
#check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_path=/var/nagiosramdisk/spool/checkresults
check_result_reaper_frequency=10
:[K
[Kcheck_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=65
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=65
host_freshness_check_interval=60
:[K
[Khost_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
:[K
[Knagios_user=nagios
notification_timeout=30
#object_cache_file=/usr/local/nagios/var/objects.cache
object_cache_file=/var/nagiosramdisk/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=s
:[K
[Kservice_interleave_factor=s
sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
#status_file=/usr/local/nagios/var/status.dat
status_file=/var/nagiosramdisk/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
#temp_path=/tmp
temp_path=/var/nagiosramdisk/tmp
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
cfg_dir=/usr/local/nagios/etc/cisco/ucsObjs
[7m(END) [27m[K
[K
[K[7m(END) [27m[K
[K[?1l>[?1049l]0;root@colnagios:~[root@colnagios ~]# exit
logout

tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Ping service queue

Post by tmcdonald »

mod_gearman is a module that is used to distribute Nagios checks to other systems, either to reduce load or to have distributed monitoring. So if you have other servers configured as workers and they go down, you might see this message.

https://labs.consol.de/nagios/mod-gearm ... index.html
https://labs.consol.de/nagios/mod-gearman/index.html

I would first make sure the worker processes are running on all of the worker servers.
Former Nagios employee
CIBTtechops
Posts: 13
Joined: Tue Dec 20, 2011 3:33 pm

Re: Ping service queue

Post by CIBTtechops »

tmcdonald wrote:mod_gearman is a module that is used to distribute Nagios checks to other systems, either to reduce load or to have distributed monitoring. So if you have other servers configured as workers and they go down, you might see this message.

https://labs.consol.de/nagios/mod-gearm ... index.html
https://labs.consol.de/nagios/mod-gearman/index.html

I would first make sure the worker processes are running on all of the worker servers.
thanks, will do that.
Lots of info to digest that you sent me.
I will start by trying to figure out which worker servers are even involved with our mod gearman setup.
CIBTtechops
Posts: 13
Joined: Tue Dec 20, 2011 3:33 pm

Re: Ping service queue

Post by CIBTtechops »

Interesting stuff.

One observation I have made is that both servers have these files (by both, I mean what I consider the main nagios box and the worker box):

mod_gearman_neb.conf
mod_gearman_worker.conf

Main server: both gearmand and mod_gearman_worker services running.
Worker server: only mod_gearman_worker services running, gearmand is not recognized

So before I continue: is this all normal setup so far?

Another observation I have made:

Worker server:
/etc/init.d/mod_gearman_worker status
mod_gearman_worker is running with pid 21297

Main server (same command, but so much more output):
/etc/init.d/mod_gearman_worker status
+ PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
+ DAEMON=/usr/bin/mod_gearman_worker
+ NAME=mod_gearman_worker
+ CONFIG=/etc/mod_gearman/mod_gearman_worker.conf
+ PIDFILE=/var/mod_gearman/mod_gearman_worker.pid
+ USER=nagios
++ id -u
+ USERID=0
+ '[' -f /etc/sysconfig/mod_gearman_worker ']'
+ '[' 0 -eq 0 ']'
++ dirname /var/mod_gearman/mod_gearman_worker.pid
+ mkdir -p /var/mod_gearman
++ dirname /var/mod_gearman/mod_gearman_worker.pid
+ chown nagios: /var/mod_gearman
+ case "$1" in
++ cat /var/mod_gearman/mod_gearman_worker.pid
+ pid=14583
+ '[' 14583 '!=' '' ']'
+ ps -p 14583
+ '[' 0 -eq 0 ']'
+ echo 'mod_gearman_worker is running with pid 14583'
mod_gearman_worker is running with pid 14583
+ exit 0
You have mail in /var/spool/mail/root
[root@colnagios mod_gearman]# /etc/init.d/mod_gearman_worker status
+ PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
+ DAEMON=/usr/bin/mod_gearman_worker
+ NAME=mod_gearman_worker
+ CONFIG=/etc/mod_gearman/mod_gearman_worker.conf
+ PIDFILE=/var/mod_gearman/mod_gearman_worker.pid
+ USER=nagios
++ id -u
+ USERID=0
+ '[' -f /etc/sysconfig/mod_gearman_worker ']'
+ '[' 0 -eq 0 ']'
++ dirname /var/mod_gearman/mod_gearman_worker.pid
+ mkdir -p /var/mod_gearman
++ dirname /var/mod_gearman/mod_gearman_worker.pid
+ chown nagios: /var/mod_gearman
+ case "$1" in
++ cat /var/mod_gearman/mod_gearman_worker.pid
+ pid=14583
+ '[' 14583 '!=' '' ']'
+ ps -p 14583
+ '[' 0 -eq 0 ']'
+ echo 'mod_gearman_worker is running with pid 14583'
mod_gearman_worker is running with pid 14583
+ exit 0

Last observation:
I cannot start gearman_top on the worker server.
Failed to connect to localhost:4730 - Connection refused
I can start this on the Main box.

You will see the config difference here (does port # need to match or is this irrelevant?):

[root@main ]# cat mod_gearman_worker.conf
server=colnagios.cibt.local:4730

[root@worker mod_gearman]# cat mod_gearman_worker.conf
server=colnagios.cibt.local
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Ping service queue

Post by tgriep »

Having the gearmand server and the mod_gearman_worker running on the XI server and only the mod_gearman_worker running on the remote system is normal.

The gearman_top command interrogates the gearmand daemon so it will only run on the gearman server and not a worker. That is normal also.

You should edit the mod_gearman_worker.conf file to have the server and port for the gearman server so edit that file and change the following line to add the port. That port is defined in the mod_gearman_neb.conf on the server but it is best to add it to the worker.

Code: Select all

server=colnagios.cibt.local:4730
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked