Most of our servers are running on VMware and it appears after our Maintenance window the following Monday we get calls from users who are unable to connect. When we look into it the VMs are stuck at shutting down. Anyone have any Nagios monitoring tips on how I can setup a check so if a VM guest is stuck at shutting down Nagios can send out a notification so we can correct before the Business Unit is impacted?
I am checking cpu, ram, disk, nsclient++, rdp and they all show good so looking at some other check to see them stuck at shutting down.
VMware VMs stuck at shutting down
Re: VMware VMs stuck at shutting down
How are they trying to connect exactly? You could script something to try and connect every so often and trigger alerts if the connection fails.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: VMware VMs stuck at shutting down
Ouch, this is a tricky one. Technically speaking, the server (which I am assuming is Windows) is operating just fine according to the checks you have set up. One way you might be able to detect this is to have a trivial program running (something that just writes to a log ever 24 hours) which would be terminated as part of the shutdown process, but before whatever process is causing the hanging. It would act like a canary, so you could check if this process is running, and if not you know the server is likely in a stuck state. Not pretty, but it might work.
Former Nagios employee
Re: VMware VMs stuck at shutting down
Primarily users connect with RDP. I have a check in place to monitor which uses the check_x224 -H $HOSTADDRESS$. I found it on the Nagios Exchange -cdienger wrote:How are they trying to connect exactly? You could script something to try and connect every so often and trigger alerts if the connection fails.
Code: Select all
https://exchange.nagios.org/directory/Plugins/Remote-Access/check_x224/detailsRe: VMware VMs stuck at shutting down
That is an idea but not sure how feasible it would be for over 1500 servers to add this canary. I'll continue to see what other options or things that I can possibly do. Maybe we are the only ones experiencing this issue with VM's getting stuck in "shuttinog down" state.tmcdonald wrote:Ouch, this is a tricky one. Technically speaking, the server (which I am assuming is Windows) is operating just fine according to the checks you have set up. One way you might be able to detect this is to have a trivial program running (something that just writes to a log ever 24 hours) which would be terminated as part of the shutdown process, but before whatever process is causing the hanging. It would act like a canary, so you could check if this process is running, and if not you know the server is likely in a stuck state. Not pretty, but it might work.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: VMware VMs stuck at shutting down
What aspect of the feasibility is the concern? You could push it out with group policy (assuming they are all Windows) and add the new services in XI via the API. Maybe I'm missing something.jkinning wrote:not sure how feasible it would be for over 1500 servers to add this canary.
Also, you might try the VMWare forums. I don't mean to push you away, but if you're looking for examples of what people have actually used to solve this issue, I suspect you'll have better luck there.