Nagios UI reports server Up - False positive

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
tezarin
Posts: 32
Joined: Tue Apr 07, 2015 8:03 am

Nagios UI reports server Up - False positive

Post by tezarin »

Hi all,

I have seen this Nagios's behavior a couple of times when my server is actually down and I can't even ping it but Nagios reports that the host is UP. Although it shows all the services are in the critical mode which is correct but what I can't understand is why it report the server as being UP when it's really down.

This is the host alive command:

Code: Select all

define command{
        command_name    check-host-alive
        command_line    sudo $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }
This is the host definition:

Code: Select all

define host{

use                             linux-server

host_name                       server-mysql-backup02

alias                          server-mysql-backup02

address                         192.125.12.14

max_check_attempts              5

check_period                    24x7

notification_interval           30

notification_period             24x7
}

I even tried adding the command below to the host definition but that didn't make a difference:

Code: Select all

 check_command                   my-check-host-alive
And also defined a new command in commands.cfg file:

Code: Select all

define command{
        command_name    my-check-host-alive
        command_line    sudo $USER1$/check_tcp -H $HOSTADDRESS$ -p 22
        }
Can someone please shed some lights on this?

Thanks in advance
jasgot
Posts: 19
Joined: Sat Feb 14, 2015 7:52 pm

Re: Nagios UI reports server Up - False positive

Post by jasgot »

I see this all the time. In my case, the service, that does the check and creates the HTML UI I see, has stopped. I perform a "service nagios restart" and all is well.
tezarin
Posts: 32
Joined: Tue Apr 07, 2015 8:03 am

Re: Nagios UI reports server Up - False positive

Post by tezarin »

jasgot wrote:I see this all the time. In my case, the service, that does the check and creates the HTML UI I see, has stopped. I perform a "service nagios restart" and all is well.
Thanks but I've already tried it, it still shows the host being UP.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios UI reports server Up - False positive

Post by jolson »

Code: Select all

define command{
        command_name    check-host-alive
        command_line    sudo $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
Is there any reason why 'sudo' is in from of your command definition? Any chance you could try taking it off and getting the results back to us?

Code: Select all

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
tezarin
Posts: 32
Joined: Tue Apr 07, 2015 8:03 am

Re: Nagios UI reports server Up - False positive

Post by tezarin »

jolson wrote:

Code: Select all

define command{
        command_name    check-host-alive
        command_line    sudo $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
Is there any reason why 'sudo' is in from of your command definition? Any chance you could try taking it off and getting the results back to us?

Code: Select all

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
If I didn't have the sudo in there, the ping would fail for all the hosts, even for the ones that were up. I just removed it and now the GUI shows the status for all the hosts is down.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios UI reports server Up - False positive

Post by jolson »

Could you please cat your sudoers file? Sudo should not be necessary for the nagios user to submit commands.

Let's try manually submitting the ping command from the Nagios user:

Code: Select all

su - nagios

Code: Select all

/usr/local/nagios/libexec/check_ping -H x.x.x.x -w 3000.0,80% -c 5000.0,100% -p 5
Does the 'nagios' user have issues submitting the above command?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
tezarin
Posts: 32
Joined: Tue Apr 07, 2015 8:03 am

Re: Nagios UI reports server Up - False positive

Post by tezarin »

Code: Select all

[root@nagios nagios]# su - nagios
could not open session
I've installed nagios via puppet maybe that's why. The ping command wouldn't run if I don't type the word sudo first.

I've actually got it to work five minutes ago, replaced the ping command with a check_tcp on port 22 and that command works, don't even have to sudo but the fact that the ping command doesn't work and I should be root to run that makes me feel there is something wrong here!

Code: Select all

define command{

command_name    check-host-alive
command_line    /usr/lib64/nagios/plugins/check_tcp -H $HOSTADDRESS$ -p 22

}
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios UI reports server Up - False positive

Post by jolson »

Interesting. Let's check /etc/passwd and /etc/group:

Code: Select all

cat /etc/passwd ; cat /etc/group
I have a hunch that the nagios user either isn't in the proper groups or doesn't have an appropriate shell. Let me know what you find out!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
tezarin
Posts: 32
Joined: Tue Apr 07, 2015 8:03 am

Re: Nagios UI reports server Up - False positive

Post by tezarin »

jolson wrote:Interesting. Let's check /etc/passwd and /etc/group:

Code: Select all

cat /etc/passwd ; cat /etc/group
I have a hunch that the nagios user either isn't in the proper groups or doesn't have an appropriate shell. Let me know what you find out!
You're probably right, here's the output of your command:

Code: Select all

[root@nagios ~]# cat /etc/passwd ; cat /etc/group
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
vcsa:x:69:69:Virtual console memory owner:/dev:/sbin/nologin
haldaemon:x:68:68:HAL Daemon:/:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
saslauth:x:499:76:"Saslauthd user":/var/empty/saslauth:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
puppet:x:52:52:Puppet User:/var/lib/puppet:/sbin/nologin
stunnel:x:600:600::/var/run/stunnel:/sbin/nologin
nscd:x:28:28:Name Service Cache Daemon:/:/sbin/nologin
clam:x:409:409:Clam Anti Virus Checker:/var/lib/clamav:/sbin/nologin
clamav:x:410:410:Clamav database update user:/var/lib/clamav:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
rpc:x:32:32:Portmapper RPC User:/var/cache/rpcbind:/sbin/nologin
nagios:x:408:408::/var/spool/nagios:/sbin/nologin
nrpe:x:407:407:NRPE user for the NRPE service:/var/run/nrpe:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
rpm:x:37:37::/var/lib/rpm:/sbin/nologin
nslcd:x:65:55:LDAP Client User:/:/sbin/nologin
capadmin:x:8001:800::/srv/home/capadmin:/bin/bash
hdfs:x:4005:405::/srv/home/hdfs:/bin/bash
contentzone:x:7201:744::/srv/home/contentzone:/bin/bash
sstore:x:7001:724::/srv/home/sstore:/bin/bash
webadmin:x:7101:734::/srv/home/webadmin:/bin/bash
apache:x:48:48:Apache:/var/www:/sbin/nologin
root:x:0:sstore
bin:x:1:daemon,root
daemon:x:2:bin,root
sys:x:3:bin,adm,root
adm:x:4:daemon,root
tty:x:5:
disk:x:6:root
lp:x:7:daemon
mem:x:8:
kmem:x:9:
wheel:x:10:root
mail:x:12:mail,postfix
man:x:15:
video:x:39:
dip:x:40:
lock:x:54:
audio:x:63:
nobody:x:99:
users:x:100:
dbus:x:81:
utmp:x:22:
utempter:x:35:
floppy:x:19:
vcsa:x:69:
cdrom:x:11:
tape:x:33:
dialout:x:18:
haldaemon:x:68:haldaemon
ntp:x:38:
saslauth:x:76:
postdrop:x:90:
postfix:x:89:
sshd:x:74:
puppet:x:52:
stunnel:x:600:
nscd:x:28:
screen:x:84:
clam:x:409:
clamav:x:410:
rpcuser:x:29:
rpc:x:32:
wbpriv:x:88:
nagios:x:408:apache
nrpe:x:407:
slocate:x:21:
nfsnobody:x:65534:
rpm:x:37:
ldap:x:55:
netdump:x:34:
contentzone:x:744:
capadmin:x:800:
webadmin:x:734:
sitestore:x:724:webadmin
hadoop:x:405:contentzone,sstore,webadmin
apache:x:48:
[root@nagios ~]# clear

[root@nagios ~]# cat /etc/passwd ; cat /etc/group
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
vcsa:x:69:69:Virtual console memory owner:/dev:/sbin/nologin
haldaemon:x:68:68:HAL Daemon:/:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
saslauth:x:499:76:"Saslauthd user":/var/empty/saslauth:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
puppet:x:52:52:Puppet User:/var/lib/puppet:/sbin/nologin
stunnel:x:600:600::/var/run/stunnel:/sbin/nologin
nscd:x:28:28:Name Service Cache Daemon:/:/sbin/nologin
clam:x:409:409:Clam Anti Virus Checker:/var/lib/clamav:/sbin/nologin
clamav:x:410:410:Clamav database update user:/var/lib/clamav:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
rpc:x:32:32:Portmapper RPC User:/var/cache/rpcbind:/sbin/nologin
nagios:x:408:408::/var/spool/nagios:/sbin/nologin
nrpe:x:407:407:NRPE user for the NRPE service:/var/run/nrpe:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
rpm:x:37:37::/var/lib/rpm:/sbin/nologin
nslcd:x:65:55:LDAP Client User:/:/sbin/nologin
capadmin:x:8001:800::/srv/home/capadmin:/bin/bash
hdfs:x:4005:405::/srv/home/hdfs:/bin/bash
contentzone:x:7201:744::/srv/home/contentzone:/bin/bash
sstore:x:7001:724::/srv/home/sstore:/bin/bash
webadmin:x:7101:734::/srv/home/webadmin:/bin/bash
apache:x:48:48:Apache:/var/www:/sbin/nologin
root:x:0:sstore
bin:x:1:daemon,root
daemon:x:2:bin,root
sys:x:3:bin,adm,root
adm:x:4:daemon,root
tty:x:5:
disk:x:6:root
lp:x:7:daemon
mem:x:8:
kmem:x:9:
wheel:x:10:root
mail:x:12:mail,postfix
man:x:15:
video:x:39:
dip:x:40:
lock:x:54:
audio:x:63:
nobody:x:99:
users:x:100:
dbus:x:81:
utmp:x:22:
utempter:x:35:
floppy:x:19:
vcsa:x:69:
cdrom:x:11:
tape:x:33:
dialout:x:18:
haldaemon:x:68:haldaemon
ntp:x:38:
saslauth:x:76:
postdrop:x:90:
postfix:x:89:
sshd:x:74:
puppet:x:52:
stunnel:x:600:
nscd:x:28:
screen:x:84:
clam:x:409:
clamav:x:410:
rpcuser:x:29:
rpc:x:32:
wbpriv:x:88:
nagios:x:408:apache
nrpe:x:407:
slocate:x:21:
nfsnobody:x:65534:
rpm:x:37:
ldap:x:55:
netdump:x:34:
contentzone:x:744:
capadmin:x:800:
webadmin:x:734:
sitestore:x:724:webadmin
hadoop:x:405:contentzone,sstore,webadmin
apache:x:48:
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios UI reports server Up - False positive

Post by jolson »

nagios:x:408:408::/var/spool/nagios:/sbin/nologin
Would you mind changing the nagios users shell to /bin/bash?
nagios:x:408:408::/var/spool/nagios:/bin/bash
It's somewhat interesting that you don't have a 'nagcmd' group - but that shouldn't be a problem if nagios without that setting.
Let's check permissions on a few files:

Code: Select all

ls -l /usr/local/nagios/var/rw/
Do any of the files in that directory have 'nagcmd' as the group? If so, we should create that group and add nagios+apache to it.

After these changes, can you switch user to nagios and run the ping command as described?

Best,


Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked