Page 1 of 1
VMware VMs stuck at shutting down
Posted: Thu Apr 27, 2017 7:14 pm
by jkinning
Most of our servers are running on VMware and it appears after our Maintenance window the following Monday we get calls from users who are unable to connect. When we look into it the VMs are stuck at shutting down. Anyone have any Nagios monitoring tips on how I can setup a check so if a VM guest is stuck at shutting down Nagios can send out a notification so we can correct before the Business Unit is impacted?
I am checking cpu, ram, disk, nsclient++, rdp and they all show good so looking at some other check to see them stuck at shutting down.
Re: VMware VMs stuck at shutting down
Posted: Fri Apr 28, 2017 10:39 am
by cdienger
How are they trying to connect exactly? You could script something to try and connect every so often and trigger alerts if the connection fails.
Re: VMware VMs stuck at shutting down
Posted: Fri Apr 28, 2017 10:49 am
by tmcdonald
Ouch, this is a tricky one. Technically speaking, the server (which I am assuming is Windows) is operating just fine according to the checks you have set up. One way you might be able to detect this is to have a trivial program running (something that just writes to a log ever 24 hours) which would be terminated as part of the shutdown process, but before whatever process is causing the hanging. It would act like a canary, so you could check if this process is running, and if not you know the server is likely in a stuck state. Not pretty, but it might work.
Re: VMware VMs stuck at shutting down
Posted: Fri Apr 28, 2017 12:05 pm
by jkinning
cdienger wrote:How are they trying to connect exactly? You could script something to try and connect every so often and trigger alerts if the connection fails.
Primarily users connect with RDP. I have a check in place to monitor which uses the check_x224 -H $HOSTADDRESS$. I found it on the Nagios Exchange -
Code: Select all
https://exchange.nagios.org/directory/Plugins/Remote-Access/check_x224/details
It didn't pick up anything.
Re: VMware VMs stuck at shutting down
Posted: Fri Apr 28, 2017 12:08 pm
by jkinning
tmcdonald wrote:Ouch, this is a tricky one. Technically speaking, the server (which I am assuming is Windows) is operating just fine according to the checks you have set up. One way you might be able to detect this is to have a trivial program running (something that just writes to a log ever 24 hours) which would be terminated as part of the shutdown process, but before whatever process is causing the hanging. It would act like a canary, so you could check if this process is running, and if not you know the server is likely in a stuck state. Not pretty, but it might work.
That is an idea but not sure how feasible it would be for over 1500 servers to add this canary. I'll continue to see what other options or things that I can possibly do. Maybe we are the only ones experiencing this issue with VM's getting stuck in "shuttinog down" state.
Re: VMware VMs stuck at shutting down
Posted: Fri Apr 28, 2017 12:40 pm
by dwhitfield
jkinning wrote:not sure how feasible it would be for over 1500 servers to add this canary.
What aspect of the feasibility is the concern? You could push it out with group policy (assuming they are all Windows) and add the new services in XI via the API. Maybe I'm missing something.
Also, you might try the VMWare forums. I don't mean to push you away, but if you're looking for examples of what people have actually used to solve this issue, I suspect you'll have better luck there.