Erroneous Check_Ping emails with 4.2.4

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Erroneous Check_Ping emails with 4.2.4

Post by neworderfac33 »

Good morning, all

I have recently set up a new instance of Core 4.2.4 on a new server and I have started to receive a few alert emails indicating that some servers are down (but only sporadically). The servers AREN'T down, incidentally - I just think that the ping is taking a little longer than expected on the odd occasion from the new Nagios master, as the same alerts aren't being sent out from my 4.0.8 instance.
On a few occasions this morning, I received alerts a small numberof servers (all from the same hostgroup), then a bit later on, a few more, but at no point were any of the servers down.
The servers in question are Linux boxes, in a different OU from my other Linux and Windows servers, for which I'm (correctly) not receiving any ping alerts at all, either from the 4.0.8 or the 4.2.4 instances.

Here're the check-host-alive and check_ping commands from my commands.cfg (the config files are exactly as they were copied from my 4.0.8 instance):

Code: Select all

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
        }
and here's the template definition used by the hosts for which the alerts are being received.

Code: Select all

define host{
        name                    linux-server-xxx; The name of this host template
        use                     generic-host    ; This template inherits other values from the generic-host template
        check_period            24x7            ; By default, Linux hosts are checked round the clock
        check_interval          1               ; Actively check the host every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      10              ; Check each Linux host 10 times (max)
        check_command           check-host-alive; Default command to check Linux hosts
        notification_period     24x7            ; Linux admins hate to be woken up, so we only notify during the day
                                                ; Note that the notification_period variable is being overridden from
                                                ; the value that is inherited from the generic-host template!
        notification_options    d,u,r           ; Only send notifications for specific host states
        notification_interval   10              ; XX Minutes or 0 to only send the FIRST notification
        contacts                my_id
        contact_groups          group_id      ;Notifications get sent to the admins by default
        #hostgroups              linux-servers   ; Host groups that Windows servers should be a member of
        register                0              ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
I tried doubling the values from 3000 to 6000 and the 5000 to 10000 in check-host-alive, but am still receiving emails, which contain:

Code: Select all

Notification Type: PROBLEM
Host: MyServerID
State: DOWN
Address: MyServerID
Info: (No output on stdout) stderr: execvp(/usr/local/nagios/libexec/check_ping, ...) failed. errno is 2: No such file or directory
Not sure about "No such file or directory" - check_ping is definitely there, as it's successfully used by over 400 other remote hosts!
Other Linux hosts that use the same template and commands are working fine too.

Thanks in advance for your help.

neworderfac33

Posts: 218
Joined: Fri Jul 24, 2015 5:04 pm
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Erroneous Check_Ping emails with 4.2.4

Post by rkennedy »

Can you please post a copy of your objects.cache (/usr/local/nagios/var/objects.cache) and teh full output of ls -al /usr/local/nagios/libexec/

It could be a formatting error somewhere in your configuration, but this will help us to see.
Former Nagios Employee
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Erroneous Check_Ping emails with 4.2.4

Post by neworderfac33 »

Good morning,
My /usr/local/nagios/var/objects.cache is 83645 lines long! is there any bit that you specifically need to see, apart from:

Code: Select all

define command {
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 6000.0,80% -c 10000.0,100% -p 5
        }
Here's the output of ls -al /usr/local/nagios/libexec:

Code: Select all

drwxrwxr-x 3 nagios     nagios    4096 Jan 19 10:10 .
drwxr-xr-x 9 root       root        87 Jan 17 15:22 ..
-rwxr-xr-x 1 nagios     nagios  210507 Jan 17 15:22 check_apt
-rwxr-xr-x 1 nagios     nagios    2342 Jan 17 15:22 check_breeze
-rwxr-xr-x 1 nagios     nagios  214648 Jan 17 15:22 check_by_ssh
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_clamd -> check_tcp
-rwxr-xr-x 1 nagios     nagios  162873 Jan 17 15:22 check_cluster
-r-sr-xr-x 1 root       nagios  211537 Jan 17 15:22 check_dhcp
-rwxr-xr-x 1 nagios     nagios  222825 Jan 17 15:22 check_dig
-rwxr-xr-x 1 nagios     nagios  227152 Jan 17 15:22 check_disk
-rwxr-xr-x 1 nagios     nagios    9465 Jan 17 15:22 check_disk_smb
-rwxr-xr-x 1 nagios     nagios  230580 Jan 17 15:22 check_dns
-rwxr-xr-x 1 nagios     nagios  126858 Jan 17 15:22 check_dummy
-rwxr-xr-x 1 nagios     nagios    3839 Jan 17 15:22 check_file_age
-rwxr-xr-x 1 nagios     nagios    6408 Jan 17 15:22 check_flexlm
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_ftp -> check_tcp
-rwxr-xr-x 1 nagios     nagios  208945 Jan 17 15:22 check_hpjd
-rwxr-xr-x 1 nagios     nagios  310838 Jan 17 15:22 check_http
-r-sr-xr-x 1 root       nagios  227035 Jan 17 15:22 check_icmp
-rwxr-xr-x 1 nagios     nagios  169751 Jan 17 15:22 check_ide_smart
-rwxr-xr-x 1 nagios     nagios   15271 Jan 17 15:22 check_ifoperstatus
-rwxr-xr-x 1 nagios     nagios   13419 Jan 17 15:22 check_ifstatus
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_imap -> check_tcp
-rwxr-xr-x 1 nagios     nagios    6980 Jan 17 15:22 check_ircd
-rwxr-xr-x 1 nagios     nagios  187504 Jan 17 15:22 check_load
-rwxr-xr-x 1 nagios     nagios    6595 Jan 17 15:22 check_log
-rwxr-xr-x 1 nagios     nagios   22752 Jan 17 15:22 check_mailq
-rwxr-xr-x 1 nagios     nagios  173037 Jan 17 15:22 check_mrtg
-rwxr-xr-x 1 nagios     nagios  170298 Jan 17 15:22 check_mrtgtraf
-rwxr-xr-x 1 nagios     nagios  186636 Jan 17 15:22 check_nagios
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_nntp -> check_tcp
-rwxrwxr-x 1 nagios     nagios   81518 Jan 19 09:14 check_nrpe
-rwxr-xr-x 1 nagios     nagios  219842 Jan 17 15:22 check_nt
-rwxr-xr-x 1 nagios     nagios  218730 Jan 17 15:22 check_ntp
-rwxr-xr-x 1 nagios     nagios  208239 Jan 17 15:22 check_ntp_peer
-rwxr-xr-x 1 nagios     nagios  203404 Jan 17 15:22 check_ntp_time
-rwxr-xr-x 1 nagios     nagios  250315 Jan 17 15:22 check_nwstat
-rwxr-xr-x 1 nagios     nagios    8926 Jan 17 15:22 check_oracle
-rwxr-xr-x 1 nagios     nagios  193037 Jan 17 15:22 check_overcr
-rwxr-xr-x 1 nagios     nagios  223058 Jan 17 15:22 check_ping
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_pop -> check_tcp
-rwxr-xr-x 1 nagios     nagios  224043 Jan 17 15:22 check_procs
-rwxr-xr-x 1 nagios     nagios  185707 Jan 17 15:22 check_real
-rwxr-xr-x 1 nagios     nagios    9675 Jan 17 15:22 check_rpc
-rwxr-xr-x 1 nagios     nagios    1465 Jan 17 15:22 check_sensors
-rwxr-xr-x 1 nagios     nagios  216552 Jan 17 15:22 check_smtp
-rwxr-xr-x 1 nagios     nagios  270133 Jan 17 15:22 check_snmp
-rwxr-xr-x 1 nagios     nagios  185230 Jan 17 15:22 check_ssh
-rwxr-xr-x 1 nagios     nagios  166649 Jan 17 15:22 check_swap
-rwxr-xr-x 1 nagios     nagios  205566 Jan 17 15:22 check_tcp
-rwxr-xr-x 1 nagios     nagios  186977 Jan 17 15:22 check_time
lrwxrwxrwx 1 root       root         9 Jan 17 15:22 check_udp -> check_tcp
-rwxr-xr-x 1 nagios     nagios  199469 Jan 17 15:22 check_ups
-rwxr-xr-x 1 nagios     nagios  163155 Jan 17 15:22 check_uptime
-rwxr-xr-x 1 nagios     nagios  157514 Jan 17 15:22 check_users
-rwxr-xr-x 1 nagios     nagios    3028 Jan 17 15:22 check_wave
-rwxr-xr-x 1 nagios     nagios  157541 Jan 17 15:22 negate
drwxr-xr-x 2 myusername adusers   4096 Jan 17 10:12 hidensubfoldername
-rwxr-xr-x 1 nagios     nagios  151054 Jan 17 15:22 urlize
-rwxr-xr-x 1 nagios     nagios    1919 Jan 17 15:22 utils.pm
-rwxr-xr-x 1 nagios     nagios    2791 Jan 17 15:22 utils.sh
Cheers

Pete
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Erroneous Check_Ping emails with 4.2.4

Post by dwhitfield »

When these suspected misconfigs happen, it's easier if we can see the actual names of things (although we understand the security risks). You can PM me the objects.cache. If it's too big to PM, then I can PM you my email address. If you do PM the objects.cache, please update this thread saying you have done so. That's the only way for it to pop back up on our support dashboard.

Also, is it possible to temporarily move the misbehaving Linux servers into the behaving Linux server OU as a test?
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Erroneous Check_Ping emails with 4.2.4

Post by neworderfac33 »

I won't be able to send you the file due to security considerations at our end, but one thing I HAVE noticed is that the owner and the group for /usr/local/nagios/libexec/check_ping on the instance that runs OK (4.0.8) is "root", whereas on the instance where I experience the sporadic errors (4.2.4), it's "nagios". Would it help if I changed owner and group to "root", do you think? Should I change the owner and group for EVERYTHING in /usr/local/nagios/libexec to "root"?

Thanks

Pete
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Erroneous Check_Ping emails with 4.2.4

Post by dwhitfield »

What's the output of ls -la /bin/ping on both?

Also, what OS are you running on both? I've got different permissions on CentOS than Debian, so just trying to get things as close to you as possible.
Locked