Page 1 of 2

nagios/check_mk GUI, issue with check_ping

Posted: Thu Apr 20, 2017 12:34 pm
by sgennari
Hi All,

I've been spinning my wheels trying to fix an existing nagios and check_mk instance after a failed attempt at an update.

Centos 6.6
Nagios Core 3.5.1
CheckMK v1.2.3i3


It's all back up and running EXCEPT nagios & check_mk Web interfaces both show PING status as UNREACHABLE or DOWN .... even though multiple services are updating correctly in the web interfaces..

Running 'check_ping' at CLI as root or nagios work just fine.

Code: Select all

root@nagiosserver ~]# /usr/lib64/nagios/plugins/check_ping -H vhost1 -w 100.0,20% -c 500.0,60% -p 5
PING OK - Packet loss = 0%, RTA = 0.41 ms|rta=0.415000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0

Code: Select all

 [root@nagiosserver ~]# su - nagios
-bash-4.1$ /usr/lib64/nagios/plugins/check_ping -H vhost1 -w 100.0,20% -c 500.0,60% -p 5
PING OK - Packet loss = 0%, RTA = 0.39 ms|rta=0.392000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0
Here's resource.cfg although nothing changed in this file;

Code: Select all

# Sets $USER1$ to be the path to the plugins
$USER1$=/usr/lib64/nagios/plugins
more 

I assume this is what is being called from command.cfg:

Code: Select all

<snip>
# 'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }

Here's the permission of the files in the plugin directory;

Code: Select all

[root@nagiosserver plugins]# ls  -l /usr/lib64/nagios/plugins 
total 2876
drwxrwxr-x. 3 root root   4096 Apr 20 09:13 .
drwxr-xr-x. 8 root root   4096 Apr 17 16:27 ..
-rwxr-xr-x. 1 root root   2346 Jan 17 20:28 check_breeze
-rwxr-xr-x. 1 root root  68584 Jan 17 20:28 check_by_ssh
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_clamd -> check_tcp
-rwxr-xr-x. 1 root root  52696 Jan 17 20:28 check_cluster
-rwxr-xr-x. 1 root root  67224 Jan 17 20:28 check_dhcp
-rwxr-xr-x. 1 root root  68680 Jan 17 20:28 check_dig
-rwxr-xr-x. 1 root root  71520 Jan 17 20:28 check_disk
-rwxr-xr-x. 1 root root   9469 Jan 17 20:28 check_disk_smb
-rwxr-xr-x. 1 root root  73856 Jan 17 20:28 check_dns
-rwxr-xr-x. 1 root root  37168 Jan 17 20:28 check_dummy
-rwxr-xr-x. 1 root root   3860 Jan 17 20:28 check_file_age
-rwxr-xr-x. 1 root root   6412 Jan 17 20:28 check_flexlm
-rwxr-xr-x. 1 root root  67464 Jan 17 20:28 check_fping
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_ftp -> check_tcp
-rwxr-xr-x. 1 root root  57040 Jan 17 20:28 check_game
-rwxr-xr-x. 1 root root  63296 Jan 17 20:28 check_hpjd
-rwxr-xr-x. 1 root root 150904 Jan 17 20:28 check_http
-rwxr-xr-x. 1 root root  71896 Apr 17 16:15 check_icmp
-rwxr-xr-x. 1 root root  54768 Jan 17 20:28 check_ide_smart
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_imap -> check_tcp
-rwxr-xr-x. 1 root root   6984 Jan 17 20:28 check_ircd
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_jabber -> check_tcp
-rwxr-xr-x. 1 root root  67304 Jan 17 20:28 check_ldap
lrwxrwxrwx. 1 root root     10 Apr 17 10:23 check_ldaps -> check_ldap
-rwxr-xr-x. 1 root root  56920 Jan 17 20:28 check_load
-rwxr-xr-x. 1 root root   6595 Jan 17 20:28 check_log
-rwxr-xr-x. 1 root root  22756 Jan 17 20:28 check_mailq
-rwxr-xr-x. 1 root root   3410 Oct 29  2013 check_mkevents
-rwxr-xr-x. 1 root root  55376 Jan 17 20:28 check_mrtg
-rwxr-xr-x. 1 root root  55240 Jan 17 20:28 check_mrtgtraf
-rwxr-xr-x. 1 root root  79944 Jan 17 20:28 check_mysql
-rwxr-xr-x. 1 root root  74288 Jan 17 20:28 check_mysql_query
-rwxr-xr-x. 1 root root  57768 Jan 17 20:28 check_nagios
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_nntp -> check_tcp
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_nntps -> check_tcp
-rwxr-xr-x. 1 root root  71000 Jan 17 20:28 check_nt
-rwxr-xr-x. 1 root root  68792 Jan 17 20:28 check_ntp
-rwxr-xr-x. 1 root root  67856 Jan 17 20:28 check_ntp_peer
-rwxr-xr-x. 1 root root  14314 Jan 17 20:28 check_ntp.pl
-rwxr-xr-x. 1 root root  64856 Jan 17 20:28 check_ntp_time
-rwxr-xr-x. 1 root root  80360 Jan 17 20:28 check_nwstat
-rwxr-xr-x. 1 root root   8926 Jan 17 20:28 check_oracle
-rwxr-xr-x. 1 root root  61776 Jan 17 20:28 check_overcr
-rwxr-xr-x. 1 root root  76096 Jan 17 20:28 check_pgsql
-rwxr-xr-x. 1 root root  70464 Jan 17 20:28 check_ping
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_pop -> check_tcp
-rwxr-xr-x. 1 root root  69672 Jan 17 20:28 check_procs
-rwxr-xr-x. 1 root root  60544 Jan 17 20:28 check_real
-rwxr-xr-x. 1 root root   9679 Jan 17 20:28 check_rpc
-rwxr-xr-x. 1 root root   1465 Jan 17 20:28 check_sensors
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_simap -> check_tcp
-rwxr-xr-x. 1 root root  90992 Jan 17 20:28 check_smtp
-rwxr-xr-x. 1 root root 114096 Jan 17 20:28 check_snmp
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_spop -> check_tcp
-rwxr-xr-x. 1 root root  58824 Jan 17 20:28 check_ssh
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_ssmtp -> check_tcp
-rwxr-xr-x. 1 root root  54088 Jan 17 20:28 check_swap
-rwxr-xr-x. 1 root root  79232 Jan 17 20:28 check_tcp
-rwxr-xr-x. 1 root root  59656 Jan 17 20:28 check_time
lrwxrwxrwx. 1 root root      9 Apr 17 10:23 check_udp -> check_tcp
-rwxr-xr-x. 1 root root  66136 Jan 17 20:28 check_ups
-rwxr-xr-x. 1 root root  50080 Jan 17 20:28 check_users
-rwxr-xr-x. 1 root root   3032 Jan 17 20:28 check_wave
drwxr-xr-x. 2 root root   4096 Aug 30  2013 eventhandlers
-rwxr-xr-x. 1 root root  51248 Jan 17 20:28 negate
-rwxr-xr-x. 1 root root  48248 Jan 17 20:28 urlize
-rwxr-xr-x. 1 root root   2088 Jan 17 20:28 utils.pm
-rwxr-xr-x. 1 root root   2791 Jan 17 20:28 utils.sh

The error message in CheckMK is
CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually exists. (worker: nagiosserver)
All the plugins live in the same directory but only check_ping (or check_icmp?) is failing. It's got to be a permissions issue, right? I've tried setting the suid bit for check_ping/check_icmp but it did not make a difference. I tried group ownership 'nagios' for the plugins too. No joy.

Any suggestions would be greatly appreciated.

Thank you,
Scott

Re: nagios/check_mk GUI, issue with check_ping

Posted: Thu Apr 20, 2017 4:35 pm
by cdienger
Can you provide the output of "iptables -L -n" as well as the host definition? It'd be odd if there were permission problems on just a single check script. I'm thinking maybe it's not resolving the way we'd expect or firewall rules on the Nagios server may be preventing it.

Re: nagios/check_mk GUI, issue with check_ping

Posted: Thu Apr 20, 2017 4:39 pm
by tmcdonald
Also, we can't really help a ton with check_mk's GUI since that is not our product, but one more thing to look for is if you have multiple parent Core processes running:

ps -ef | grep bin/nagios

Re: nagios/check_mk GUI, issue with check_ping

Posted: Fri Apr 21, 2017 2:45 pm
by sgennari
tmcdonald wrote:Also, we can't really help a ton with check_mk's GUI since that is not our product, but one more thing to look for is if you have multiple parent Core processes running:

ps -ef | grep bin/nagios
Sorry, I didn't realize Check_MK GUI was not part of the Nagios package!

Here is the output:

Code: Select all

root@nagios server ~# ps -ef | grep bin/nagios
root     10616 10158  0 14:30 pts/1    00:00:00 grep bin/nagios
nagios   14619     1  1 Apr20 ?        00:16:56 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

Re: nagios/check_mk GUI, issue with check_ping

Posted: Fri Apr 21, 2017 2:57 pm
by sgennari
cdienger wrote:Can you provide the output of "iptables -L -n" as well as the host definition? It'd be odd if there were permission problems on just a single check script. I'm thinking maybe it's not resolving the way we'd expect or firewall rules on the Nagios server may be preventing it.

iptables output

Code: Select all

[root@nagiosserver ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW tcp dpt:22
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW tcp dpt:80
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW tcp dpt:443
REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination


Running check_ping via the command line works on the same server.

Code: Select all

root@nagiosserver ~]# /usr/lib64/nagios/plugins/check_ping -H vhost1 -w 100.0,20% -c 500.0,60% -p 5
PING OK - Packet loss = 0%, RTA = 0.41 ms|rta=0.415000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0
In terms of the "host definition", the example host "vhost1" is defined in /etc/nagios/objects/check_mk_objects.mk

Code: Select all

[root@nagiosserver objects]# pwd
/etc/nagios/objects
[root@nagiosserver objects]# ls -la
total 284
drwxrwx---. 3 nagios nagios   4096 Apr 21 15:13 .
drwxrwxr-x. 6 nagios root     4096 Apr 21 14:54 ..
drwxr-xr-x. 2 nagios root     4096 Apr 21 14:52 backup_20161205
-rw-r--r--. 1 root   root   208883 Apr 19 13:05 check_mk_objects.cfg
lrwxrwxrwx. 1 nagios root       42 Feb  1  2013 check_mk_templates.cfg -> /usr/share/check_mk/check_mk_templates.cfg
-rw-rw-r--. 1 nagios root     7378 Apr 20 13:16 commands.cfg
-rw-r--r--. 1 root   root     7379 Apr 19 15:06 commands.cfg.20170419
-rw-rw-r--. 1 nagios root     7704 Aug 30  2013 commands.cfg.rpmnew
-rw-rw-r--. 1 nagios root     2733 Dec  5 13:45 contacts.cfg
-rw-r--r--. 1 nagios root       70 Feb 28  2013 hostgroups.cfg
-rw-rw-r--. 1 nagios root     5403 Aug 30  2013 localhost.cfg
-rw-rw-r--. 1 nagios root     3124 Aug 30  2013 printer.cfg
-rw-rw-r--. 1 nagios root     3293 Aug 30  2013 switch.cfg
-rw-rw-r--. 1 nagios root    11158 Aug 30  2013 templates.cfg
-rw-rw-r--. 1 nagios root     3208 Aug 30  2013 timeperiods.cfg
-rw-rw-r--. 1 nagios root     4019 Aug 30  2013 windows.cfg

Code: Select all

# ----------------------------------------------------
# vhost1
# ----------------------------------------------------

define host {
  host_name                     vhost1
  use                           check_mk_host
  address                       10.30.2.29
  _TAGS                         lan cmk-agent tcp critical linux serverroom wato /wato/servers/
  _FILENAME                     /wato/servers/hosts.mk
  hostgroups                    servers
  contact_groups                its
  notifications_enabled         1
  parents                       sw-4500
}

Is this what you were looking for?

Scott

Re: nagios/check_mk GUI, issue with check_ping

Posted: Mon Apr 24, 2017 2:42 pm
by tgriep
I am thinking that your server is running Mod Gearman and it is setup to run that check on a system that doesn't have the plugin installed on it.
Is the server running Mod Gearman and did it get updated?

Re: nagios/check_mk GUI, issue with check_ping

Posted: Mon Apr 24, 2017 2:44 pm
by cdienger
Can you tell me if the following works:

Code: Select all

/usr/lib64/nagios/plugins/check_ping -H 10.30.2.29 -w 100.0,20% -c 500.0,60% -p 5
?

Re: nagios/check_mk GUI, issue with check_ping

Posted: Mon Apr 24, 2017 3:51 pm
by sgennari
cdienger wrote:Can you tell me if the following works:

Code: Select all

/usr/lib64/nagios/plugins/check_ping -H 10.30.2.29 -w 100.0,20% -c 500.0,60% -p 5
?

Yes, it works fine from the command line.

Code: Select all

[root@nagiosserver objects]# /usr/lib64/nagios/plugins/check_ping -H 10.30.2.29 -w 100.0,20% -c 500.0,60% -p 5
PING OK - Packet loss = 0%, RTA = 0.51 ms|rta=0.510000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0

Re: nagios/check_mk GUI, issue with check_ping

Posted: Mon Apr 24, 2017 3:57 pm
by sgennari
tgriep wrote:I am thinking that your server is running Mod Gearman and it is setup to run that check on a system that doesn't have the plugin installed on it.
Is the server running Mod Gearman and did it get updated?

Yes it is running Gearman but since I not sure what this even is, it likely the problem! It is very possible it was updated in the failed update attempt (via yum).yum

Code: Select all

[root@nagios objects]# ps -ef | grep gearman
 6975 ?        Ssl  2782:39 /usr/local/sbin/gearmand -p 4730 -P /usr/local/var/mod_gearman/gearmand.pid -d -j 0 -t 10 --log-file=/usr/local/var/mod_gearman/gearmand.log --listen=0.0.0.0
 7344 ?        Ss   113:53 /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
 1778 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
 2712 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
15320 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
17252 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
18462 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
19153 ?        S      0:00  \_ /usr/local/bin/mod_gearman_worker -d --config=/usr/local/etc/mod_gearman_worker.conf --pidfile=/usr/local/var/mod_gearman/mod_gearman_worker.pid
19155 pts/0    S+     0:00          \_ grep gearman


Re: nagios/check_mk GUI, issue with check_ping

Posted: Mon Apr 24, 2017 4:08 pm
by tgriep
Try and disable Mod Gearman and see if that fixes the issue.
Edit the nagios.cfg file and look for a broker_module line that is similar to the following example.

Code: Select all

broker_module=/usr/lib64/mod_gearman2/mod_gearman2.o config=/etc/mod_gearman2/module.conf eventhandler=no
Comment it out and restart the nagios daemon and see it that fixed the error.