My host statuses are DOWN but check_ping shows they're up

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

My host statuses are DOWN but check_ping shows they're up

Post by jbruyet »

Hey all, Nagios server was up and running and working great until today. We're having some problems with our VLAN so I added a check_ping command to my Nagios server to monitor it (by monitoring a server at the far end of the VLAN) and now all of my hosts have a status of DOWN even though the check_ping command shows them all up. Here's my check_ping command definition from commands.cfg:

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

And here's the service definition from my linux.cfg file:

check_command check_ping!200.0,70%!400.0,90%

And here's the check_ping stat for one of my Linux servers:

Ping stats OK 06-05-2012 16:23:45 0d 0h 58m 10s 1/3 PING OK - Packet loss = 0%, RTA = 0.46 ms

SO, why does every device on my network now have a status of DOWN when check_ping shows them ALL as being UP?

Thanks,

Joe B
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jbruyet »

Hmmm... This is in my Host State Information:

Status Information: check_ping: %s: Warning threshold must be integer or percentage!
- -c
Usage: check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>% [-p packets] [-t timeout] [-4 -6]

Is the error saying it has to be one or the other? Doesn't my syntax follow what I see everywhere else?

check_command check_ping!200.0,70%!400.0,90%

Thanks,

Joe B
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jsmurphy »

You've confused my poor pre-coffee brain. So the actual host checks are saying all servers are down? But you also have a ping service for every host that says they are up? Have a look at the host check_command (usually in the host template) and make sure that is formatted correctly, is the check_command you have posted for the service check or for the host check?

Maybe some more complete config files if I've misunderstood the problem.
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jbruyet »

jsmurphy wrote:You've confused my poor pre-coffee brain.
Step one accomplished. :lol:

jsmurphy wrote:So the actual host checks are saying all servers are down? But you also have a ping service for every host that says they are up?
That is correct.
jsmurphy wrote:Have a look at the host check_command (usually in the host template) and make sure that is formatted correctly.
Ok, here's where things get a little interesting. In my host template the check_command is check_host_alive. I'm surprised to find out that inserting a check_ping command elsewhere would "nullify" this command.
jsmurphy wrote:Is the check_command you have posted for the service check or for the host check?
Hmmm... I'm still a bit of a Nagios Neophyte so I'm not sure I know the difference on this one. Oh, maybe I do know this one. The host check_command I have listed is in my linux.cfg file as "define service." Here's that service command:

define service {
use generic-service
hostgroup LinuxHostGroup
service_description Ping stats
normal_check_interval 5
check_command check_ping!200.0,70%!400.0,90%
}
jsmurphy wrote:Maybe some more complete config files if I've misunderstood the problem.
I believe you have things down. If you would like more files just let me know and I'll get them up here for you.

Thanks very much for your help,

Joe B
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: My host statuses are DOWN but check_ping shows they're u

Post by agriffin »

Today is World IPv6 Launch Day and that may have affected your ping checks. Try adding '-4' to your command definitions to force IPv4 and let us know if that helps.
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jbruyet »

That would have been a nice, easy fix (those are the ones I prefer) but after one hour I'm still showing everything as being down. Windows servers, Linux servers, workstations, switches, everything.

Thanks,

Joe B
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jsmurphy »

agriffin wrote:Today is World IPv6 Launch Day and that may have affected your ping checks. Try adding '-4' to your command definitions to force IPv4 and let us know if that helps.
There I was wearing my metaphorical IPv6 party hat yesterday and this never even crossed my mind.

I'm struggling to think of another explanation as to why a change to a service check would affect your host check, especially when they are running two completely separate commands. Do you have multiple NIC's on your server? Are they on different networks or vlans? How does pinging from the server command line go? I'm not sure where I'm going with this to be honest... maybe some kind of networking issue on the server.
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jbruyet »

Ok, I was just wandering around in files and I discovered that I have two check_ping definitions in my commands.cfg file. Here are the two command_lines:

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 10.0,80% -c 15.0,100% -p 5

What really bothers me about this is that I'm the only one who could have done this. Unless Nagios comes stock with the two different check_ping definitions.

Thanks,

Joe B
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jbruyet »

Ok, more clarification. The following line is under the command_name of check_host_alive:

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 10.0,80% -c 15.0,100% -p 5

And this line is under the command_name of check_ping:

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

If I change the first line to match the second line everything starts showing up as DOWN. If I have these two lines configured this way things are good. I just added check_ping back in to my linux.cfg file and NOW the servers are showing up as UP and the ping stats are good too. I don't understand why the check_host_alive command has to have the argument variables but then again I've only scratched the surface of what my Nagios can do. I'm sure I'll find other "interesting issues" with it.

Thanks for the help,

Joe B
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: My host statuses are DOWN but check_ping shows they're u

Post by jsmurphy »

I don't understand why the check_host_alive command has to have the argument variables but then again
But... but... your check_ping definition has arguments, your check_host_alive doesn't? Either way I'm glad you solved your issue!
Locked