Page 1 of 5

A server goes down

Posted: Fri Dec 20, 2013 8:18 am
by ancovington
One of my VM hosts went down! :o :!:
Is there a page on Nagios that will tell me why it went down?

Re: A server goes down

Posted: Fri Dec 20, 2013 9:49 am
by tmcdonald
It can definitely tell you *that* it went down, but the *why* is a lot harder. If a host goes down it can't really send any data to Nagios, and aside from state and perfdata Nagios doesn't have much to go by. Try taking a look at the data just before it went down and see if there is any indication of a problem (high CPU, slow disk, etc).

Re: A server goes down

Posted: Fri Dec 20, 2013 10:00 am
by ancovington
Thank you. I don't think management will be too happy knowing that we bought a tool that really doesn't tell us what's going on, but we will try to make do. Thank you again.

Re: A server goes down

Posted: Fri Dec 20, 2013 10:09 am
by slansing
It would be hard for a host to tell you what went wrong when it is offline.. Perhaps you should set up some checks that are designed to warn you properly before an issue occurs. For instance, you might want to lower your warning threshold on certain checks. Depending on why the VM went down, I would focus on a plugin, or check that can monitor that. Are you a sysadmin? Or do you have any sysadmins in the house that can look into the syslogs or kernel messages so you can tell why it went down? Did someone just shut the VM down?

If you are only monitoring memory, partition space, cpu, etc, and you were not warned for those checks, either something happened IMMEDIATELY, or you have your thresholds far off from what they should be. Another possibility is, as I noted above, you are not running any checks that would be of value for what caused that VM to crash, maybe it had nothing to do with CPU, or Mem, or share space.

Re: A server goes down

Posted: Fri Dec 20, 2013 10:16 am
by tmcdonald
ancovington wrote:Thank you. I don't think management will be too happy knowing that we bought a tool that really doesn't tell us what's going on, but we will try to make do. Thank you again.
There isn't a tool in the world that is capable of telling you why everything in your network happens. If there was then 70% of tech support would be out of business. It's not a software shortcoming, it's a technology restraint. Sorry if this is a bit morbid, but you can compare the situation to an autopsy. To determine cause of death you can't rely on the person telling you why they died; you have to have a professional diagnosis. To keep the analogy going, if the person calls you every day and says "I'm not feeling well", then you can start to get an idea of what might be wrong but you can't know for sure without going to a doctor.

Some things are obvious like "Disk Usage 95%" or "CPU Load 5, 6, 6" but if a server just unexpectedly stops you have to take a look at the logs.

Re: A server goes down

Posted: Mon Dec 23, 2013 8:22 am
by ancovington
Tank you. After working with support Friday, the Nagios Server is only seeing one cpu, but 4 was added to the OS at the start of the build. Could you please explain why this is?

Re: A server goes down

Posted: Mon Dec 23, 2013 10:44 am
by slansing
What do you mean 4 were added to your OS? Are you running a VM? Did you make sure to properly add the new hardware allocations with the instructions VMware provides?

Re: A server goes down

Posted: Mon Dec 23, 2013 10:45 am
by abrist
What does ESX report? Is the vm still provisioned 4 cores?

Re: A server goes down

Posted: Mon Dec 23, 2013 11:42 am
by ancovington
Yes the vm is still provisioned for 4 cores.

Re: A server goes down

Posted: Mon Dec 23, 2013 12:34 pm
by slansing
Are you sure it was not 4 cores that were provisioned when the VM was created, and not 4 CPUs? It sounds like you provisioned 1 CPU, 4 cores.