ESXi 5.x and Dell hardware monitoring

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
roffer
Posts: 5
Joined: Fri Jan 25, 2013 1:20 pm

ESXi 5.x and Dell hardware monitoring

Post by roffer »

Is it possible to monitor Dell hardware events on Nagios when the operating system in question is ESXi 5.x. All the plugins that I have found for OpenManage or ESXi only seem to support generic hardware notifications (hard drive full, CPU overloaded, etc.) but does not seem to support lower level hardware issues, like SMART drive predictive failures, RAID controller errors, RAM CRC errors, etc.

Using the trap method seems like it would work, but from the web pages I found it seems you only get notifications when something goes wrong rather than a notification that everything is ok with a device status (like fan speed, etc.). Is there a plugin or method out there that has a detailed level of monitoring for Dell hardware on ESXi 5.x?!

Here are some of the items that are available via OpenManage, which we are trying to move away from....

Dell Chassis CPU failure
Dell Chassis Fan Status
Dell Chassis Memory Arrays
Dell Chassis Storage - Controllers
Dell Health Check
Dell Power Supply Status
Dell Amperage Status Combined
Dell Battery Status Combined
Dell Chassis Intrusion Status Combined
Dell Chassis Status
Dell Cooling Device Status Combined
Dell Cooling Unit Status Combined
Dell Event Log Status
Dell Global System Status
Dell Memory Device Status Combined
Dell Power Supply Status Combined
Dell Power Unit Status Combined
Dell Processor Device Status Combined
Dell Temperature Status Combined
Dell Voltage Status Combined
Storage Management Global System Status
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: ESXi 5.x and Dell hardware monitoring

Post by scottwilkerson »

I believe this plugin can do what you are looking for
http://exchange.nagios.org/directory/Pl ... ge/details
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
roffer
Posts: 5
Joined: Fri Jan 25, 2013 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by roffer »

Regretfully that is not the case for ESXi 5.x, see the documentation here: http://folk.uio.no/trondham/software/ch ... -supported
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: ESXi 5.x and Dell hardware monitoring

Post by scottwilkerson »

Hmm, that doesn't seem fair...

there are some other plugins that look like the use a omreport utility, do you know if that is available to connect to your server?

http://exchange.nagios.org/index.php?op ... openmanage
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
roffer
Posts: 5
Joined: Fri Jan 25, 2013 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by roffer »

It seems like the Dell monitoring plugins fall into one of three categories:

OMSA based plugins: don't work on ESXi due to lack of SNMP/NRPE support
DRAC based plugins: don't supply drive/storage monitoring, which besides power supplies is the most critical items we need monitored
OMSA traps: doesn't give you a real time updated status, just a message upon failure. Does not inspire great confidence plus requires traps to be properly configured, which is easier said than done on Dell/ESXi due to bugs in OMSA

I'm looking for a Nagios/Dell/ESXi solution that has all the hardware info and up to date status. Anyone out there have this working?!
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by abrist »

This plugin was mentioned on the dell forums. Have you given it a try/does it meet your needs?

http://en.community.dell.com/support-fo ... 66244.aspx
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
roffer
Posts: 5
Joined: Fri Jan 25, 2013 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by roffer »

Thanks for the recommendation, regretfully this plugin has two issues that I was hoping to avoid:

- the login and password for the ESXi server is submited everytime Nagios polls the server and the information is stored in the clear in the config files. This is a real security concern
- The output form this plugin is similar to using traps, it only has one service line listed under the server and gives an 'all ok' across the board when it does not detect any errors. It only shows an error condition when it arises and then names the specific failed device in the description (e.g. power supply 1 has failed). My preference is to see the state for each service that I know we are monitoring, like things used to work under Nagios for Dell hardware using OMSA's VIB via SNMP on ESX 4.x.

It sounds like this type of monitoring behavior is no longer available for Nagios/ESXi/Dell since SNMP for Dell OIDs has been dropped in ESXi, so I will probably deploy the plugin you mentioned.

Any plugin developers that are considering working on CIM based solutions for Dell/ESXi servers - I would love to see the services listed separately for each server so that we know what we are watching (drives, power supplies, fans, CRC memory errors, etc.) and thereby know what we are NOT monitoring (HBAs, iDRAC, status of lid sensor, etc.). One service listed under the server means we rely on the plugin to interpret all the OMSA messages correctly.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by abrist »

roffer wrote: - the login and password for the ESX server get passed whenever Nagios polls the server and it also gets stored in the clear in the config files
You could look into using $USERn$ macros for the login information and then lock down the resource.cfg file. Not perfect, but protecting passwords was one of the $USERn$ macros original purpose.
- The output, much like with using traps, gives an all clear across the board and only shows the error condition when it arises. My preference is to see the state for each service that I know we are monitoring, like things used to work under SNMP and ESX.

It sounds like this type of monitoring behavior is no longer available for Nagios/ESXi/Dell since SNMP for extended OIDs has been dropped in ESXi, so I will probably deploy the plugin you mentioned.
I don't know the reasons behind the decision, but you are not alone in your disappointment over the ESXi snmp changes.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
roffer
Posts: 5
Joined: Fri Jan 25, 2013 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by roffer »

I will explore your suggestion on the security workaround, sounds like a good idea.

As for SNMP, yes you are 100% correct, I wish they had not removed it and find it surprising that their large institutional customers let them do so. I guess industry is now moving toward CIM and dropping SNMP, but Nagios plugins have not matured to the same level and capabilities as older SNMP plugins.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: ESXi 5.x and Dell hardware monitoring

Post by abrist »

Here is a couple of links to get you started with user macros. ESX has become a bit of a moving target over the last year. Hopefully the dust will settle and the community will get some solutions together. Many, many nagios users monitor ESX and Vsphere.

http://nagios.sourceforge.net/docs/3_0/ ... ource_file
http://nagios.sourceforge.net/docs/3_0/macros.html
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked