Page 1 of 1

box293 Which Guest VMs were on the host, when host failed?

Posted: Tue Jul 28, 2015 6:21 am
by srcadmin
Hi there,

I'm looking for a best-practice solution for next scenario: If a ESX host fails and vMotion moves guest VMs to other host in cluster, I would like to know which VMs were moved. Then I can check in vCenter how long were those VMs offline, etc etc. The key thing here is - which VMs were on which host at a specific time?

Few possible approaches:
- There is a Cluster_vMotion_Info in box293 documentation which returns vMotion events, but - is it possible to monitor which VMs were moved (and if possible between which hosts too)?

- I could monitor a host (box293 method via vMA) - somehow get a list of guests on specific host and store that info in nagios -> In that case "performance data" should be stored, but is not numeric, but its Text string. Some kind of "history of statuses", is that possible in Nagios?

- I could monitor a Guest VM (box293 method via vMA), but than info is needed: On which host is VM running.

The base challenge is: Which Guest VMs were on the host, when host failed?

What you guys think?

Re: box293 Which Guest VMs were on the host, when host faile

Posted: Tue Jul 28, 2015 4:30 pm
by jdalrymple
I'm not going to pretend to be any sort of expert regarding Box293's plugin, but I will offer up how I would do it if I were seeking the solution...

Instead of worrying about host failed this that - just have a service check for each VM at all times that has the host as the $SERVICEOUTPUT$. It will have to be stateful so that if that host changes throw a warning or something - but ultimately the idea is just to have a "journal" of sorts to look back upon.

The legwork of looking up all the guests that changed hosts at that time should be trivial.

I'm sure Box293 will have additional insight for you though.

Re: box293 Which Guest VMs were on the host, when host faile

Posted: Tue Jul 28, 2015 8:48 pm
by Box293
I've put a lot of thought in this in the past as I have also wondered the same as you. Basically what we are after is performance data for strings instead of numbers ... doesn't really exist right now.

At last years conference I spoke to a handful of people who wanted Nagios to check the VM and see what ESX(i) host it is running on. Then, check the Nagios object for the VM and see if it has the ESX(i) host as it's parent and if not, complain.

So I created a check called Guest_Host. It does work, but the current limitation with Nagios is updating a parent to the running nagios configuration (and host definitions). I do plan on creating an event handler to solve these limitations however that is currently a work in progress.

Specifically, the Guest_Host check will tell you when a VM is moved to a host that is not it's parent, however once it's moved (intentionally or not) you'll need to update the Nagios object defintions to reflect this so the warning status is removed.

Here is the extract from the manual:
Guest_Host
• Description:
◦ Compare the ESX(i) host the guest is running on against the parent_hosts defined for this Nagios host object
◦ If ESX(i) and Nagios do not match then a WARNING state is triggered
◦ This uses the Nagios objectjson.cgi to query Nagios directly to determine this
◦ The purpose of this check is return the name of the ESX(i) host the guest is currently running on. IF a WARNING state is triggered then Nagios will execute the box293_event_handler to update the directive parent_hosts in this Nagios host object definition.
▪ NOTE: The box293_event_handler is currently a work in progress, stay tuned!
• Required Arguments:
◦ --guest
◦ --query_url
◦ --query_username
◦ --query_password
◦ --service_status_info
• Optional Arguments:
◦ --modifier
• Other Requirements:
◦ Nagios Core 4.0.8 is required on your Nagios server for the objectjson.cgi query to work (it may work from 4.0.4 onwards however it has only been tested with 4.0.8)
• Example 1:
◦ box293_check_vmware.pl --server vcenter.box293.local --check Guest_Host --guest windows10preview.box293.local --query_url 'http://xitest.box293.local/nagios/cgi-b ... ctjson.cgi' --query_username 'readonly' --query_password 'AV3ryStr0ngP@ssw0rd' --service_status_info "Current_Parent=esxi001.box293.local"
• Output for Example 1:
◦ Current_Parent=esxi001.box293.local
• Example 2:
◦ box293_check_vmware.pl --server vcenter.box293.local --check Guest_Host --guest HOST001 --query_url 'http://xitest.box293.local/nagios/cgi-b ... ctjson.cgi' --query_username 'readonly' --query_password 'AV3ryStr0ngP@ssw0rd' --service_status_info ""
• Output for Example 2:
◦ No_Parent_Defined_-_Should_Be=esxi001.box293.local
• Note: These two examples were executed at the command line. In a service definition, the --service_status_info argument would be followed with $SERVICEOUTPUT$
◦ Example:
▪ --service_status_info "$SERVICEOUTPUT$"
Have a play with it and see if it provides any useful functionality. Please do not hesitate to email me about this functionality as I haven't had any feedback from users yet. There's always room for improvement :D

Re: box293 Which Guest VMs were on the host, when host faile

Posted: Wed Jul 29, 2015 4:07 am
by srcadmin
Hey, Troy!

Thanks for you replay and some leads. I'll look into that and see how to manage to solve my challenge.

I'll write back here with feedback, of course.

Thanks!

Re: box293 Which Guest VMs were on the host, when host faile

Posted: Wed Jul 29, 2015 10:21 am
by jolson
srcadmin,

Thank you - we're looking forward to your response.