ESXi 5.x and Dell hardware monitoring

roffer · Post by **roffer** » Sat Jan 26, 2013 6:40 pm

Is it possible to monitor Dell hardware events on Nagios when the operating system in question is ESXi 5.x. All the plugins that I have found for OpenManage or ESXi only seem to support generic hardware notifications (hard drive full, CPU overloaded, etc.) but does not seem to support lower level hardware issues, like SMART drive predictive failures, RAID controller errors, RAM CRC errors, etc.

Using the trap method seems like it would work, but from the web pages I found it seems you only get notifications when something goes wrong rather than a notification that everything is ok with a device status (like fan speed, etc.). Is there a plugin or method out there that has a detailed level of monitoring for Dell hardware on ESXi 5.x?!

Here are some of the items that are available via OpenManage, which we are trying to move away from....

Dell Chassis CPU failure
Dell Chassis Fan Status
Dell Chassis Memory Arrays
Dell Chassis Storage - Controllers
Dell Health Check
Dell Power Supply Status
Dell Amperage Status Combined
Dell Battery Status Combined
Dell Chassis Intrusion Status Combined
Dell Chassis Status
Dell Cooling Device Status Combined
Dell Cooling Unit Status Combined
Dell Event Log Status
Dell Global System Status
Dell Memory Device Status Combined
Dell Power Supply Status Combined
Dell Power Unit Status Combined
Dell Processor Device Status Combined
Dell Temperature Status Combined
Dell Voltage Status Combined
Storage Management Global System Status

scottwilkerson · Post by **scottwilkerson** » Sat Jan 26, 2013 11:13 pm

I believe this plugin can do what you are looking for
http://exchange.nagios.org/directory/Pl ... ge/details

roffer · Post by **roffer** » Sat Jan 26, 2013 11:39 pm

Regretfully that is not the case for ESXi 5.x, see the documentation here: http://folk.uio.no/trondham/software/ch ... -supported

scottwilkerson · Post by **scottwilkerson** » Mon Jan 28, 2013 12:18 am

Hmm, that doesn't seem fair...

there are some other plugins that look like the use a omreport utility, do you know if that is available to connect to your server?

http://exchange.nagios.org/index.php?op ... openmanage

roffer · Post by **roffer** » Mon Jan 28, 2013 3:34 am

It seems like the Dell monitoring plugins fall into one of three categories:

OMSA based plugins: don't work on ESXi due to lack of SNMP/NRPE support
DRAC based plugins: don't supply drive/storage monitoring, which besides power supplies is the most critical items we need monitored
OMSA traps: doesn't give you a real time updated status, just a message upon failure. Does not inspire great confidence plus requires traps to be properly configured, which is easier said than done on Dell/ESXi due to bugs in OMSA

I'm looking for a Nagios/Dell/ESXi solution that has all the hardware info and up to date status. Anyone out there have this working?!

abrist · Post by **abrist** » Mon Jan 28, 2013 5:11 pm

This plugin was mentioned on the dell forums. Have you given it a try/does it meet your needs?

http://en.community.dell.com/support-fo ... 66244.aspx

roffer · Post by **roffer** » Wed Jan 30, 2013 11:35 am

Thanks for the recommendation, regretfully this plugin has two issues that I was hoping to avoid:

- the login and password for the ESXi server is submited everytime Nagios polls the server and the information is stored in the clear in the config files. This is a real security concern
- The output form this plugin is similar to using traps, it only has one service line listed under the server and gives an 'all ok' across the board when it does not detect any errors. It only shows an error condition when it arises and then names the specific failed device in the description (e.g. power supply 1 has failed). My preference is to see the state for each service that I know we are monitoring, like things used to work under Nagios for Dell hardware using OMSA's VIB via SNMP on ESX 4.x.

It sounds like this type of monitoring behavior is no longer available for Nagios/ESXi/Dell since SNMP for Dell OIDs has been dropped in ESXi, so I will probably deploy the plugin you mentioned.

Any plugin developers that are considering working on CIM based solutions for Dell/ESXi servers - I would love to see the services listed separately for each server so that we know what we are watching (drives, power supplies, fans, CRC memory errors, etc.) and thereby know what we are NOT monitoring (HBAs, iDRAC, status of lid sensor, etc.). One service listed under the server means we rely on the plugin to interpret all the OMSA messages correctly.

abrist · Post by **abrist** » Wed Jan 30, 2013 11:47 am

roffer wrote: - the login and password for the ESX server get passed whenever Nagios polls the server and it also gets stored in the clear in the config files

You could look into using $USERn$ macros for the login information and then lock down the resource.cfg file. Not perfect, but protecting passwords was one of the $USERn$ macros original purpose.

- The output, much like with using traps, gives an all clear across the board and only shows the error condition when it arises. My preference is to see the state for each service that I know we are monitoring, like things used to work under SNMP and ESX.

It sounds like this type of monitoring behavior is no longer available for Nagios/ESXi/Dell since SNMP for extended OIDs has been dropped in ESXi, so I will probably deploy the plugin you mentioned.

I don't know the reasons behind the decision, but you are not alone in your disappointment over the ESXi snmp changes.

roffer · Post by **roffer** » Wed Jan 30, 2013 12:07 pm

I will explore your suggestion on the security workaround, sounds like a good idea.

As for SNMP, yes you are 100% correct, I wish they had not removed it and find it surprising that their large institutional customers let them do so. I guess industry is now moving toward CIM and dropping SNMP, but Nagios plugins have not matured to the same level and capabilities as older SNMP plugins.

abrist · Post by **abrist** » Wed Jan 30, 2013 12:17 pm

Here is a couple of links to get you started with user macros. ESX has become a bit of a moving target over the last year. Hopefully the dust will settle and the community will get some solutions together. Many, many nagios users monitor ESX and Vsphere.

http://nagios.sourceforge.net/docs/3_0/ ... ource_file
http://nagios.sourceforge.net/docs/3_0/macros.html

Nagios Support Forum

ESXi 5.x and Dell hardware monitoring

ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring

Re: ESXi 5.x and Dell hardware monitoring