Page 1 of 2

Help using xi for a newbie - check_ilo_health

Posted: Sat Oct 26, 2019 12:47 am
by elliotK
Hi all first post and new to Nagios.

I'm not a Linux guy at all and am Googling every step of the way to learn about chmod, outputting a command to a text file, editing with nano, installing pearl modules, etc. I'm currently on the free trial of XI and if I can get it set up I can ask my client to purchase.

I've spent hours trying to figure out how to monitor a physical HP DL380 G9 server via iLO and the plugin check_ilo2_health and its time to ask for help. I gave up on SNMP monitoring as the MIBs and OIDs were leading me down a rabbit hole. I basically want an alert if there is a hardware failure or some temps go above threshold.

I'm actually pretty stoked with where I've got up to, and I think it's all working but I just don't know how to finish off and set up the services.

I'm running XI on the virtual appliance that you can download and install on an ESXi host.

I have an SSH session to the Nagios server (even this is new to me) so I can test the plugin and I'm getting results, but, I can't make head or tail of the switches for the different options, and how to set up the services for the host to show something meaningful.

I set up a Command. This did have variables for the username and password, but I've put them in plain text to make it work as I couldn't figure out where the variables were.

Code: Select all

$USER1$/check_ilo2_health.pl -u ## -p ##-H $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
These are the options. I figured out I had to use '-3' to get it to work with iLO4

Code: Select all

[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -?
Usage: check_ilo2_health.pl [-H <host>] [ -u|--user=<USERNAME> ]
  [ -p|--password=<PASSWORD> ] [ -f|--inputfile=<filename> ]
  [ -a|--fanredundancy ] [ -c|--checkdrives ] [ -d|--perfdata ]
  [ -e|--skipsyntaxerrors ] [ -n|--notemperatures ] [ -3|--ilo3 ]
  [ -o|--powerredundancy ] [ -b|--locationlabel ] [ -l|--eventlogcheck]
  [ -i|--ignorelinkdown ] [ -x|--ignorebatterymissing ] [ -s|--sslv3 ]
  [ -t <timeout> ] [ -r <retries> ] [ -g|--getinfos ] [ --sslopts ]
  [ -U|--ignorelinkunknown ] [ -v|--verbose ]
No switch gives me this:

Code: Select all

[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3
ILO2_HEALTH OK - (Board-Version: ILO>=3) Temperatures: 01-Inlet_Ambient (OK): 16, 02-CPU_1 (OK): 40, 03-CPU_2 (OK): 40, 05-P1_DIMM_7-12 (OK): 32, 07-P2_DIMM_7-12 (OK): 28, 08-HD_Max (OK): 40, 10-Chipset (OK): 42, 11-PS_1_Inlet (OK): 28, 12-PS_2_Inlet (OK): 32, 13-VR_P1 (OK): 37, 14-VR_P2 (OK): 39, 15-VR_P1_Mem (OK): 26, 16-VR_P1_Mem (OK): 29, 17-VR_P2_Mem (OK): 33, 18-VR_P2_Mem (OK): 29, 19-PS_1_Internal (OK): 40, 20-PS_2_Internal (OK): 40, 27-HD_Controller (OK): 60, 29-LOM (OK): 42, 30-Front_Ambient (OK): 25, 31-PCI_1_Zone. (OK): 30, 32-PCI_2_Zone. (OK): 31, 33-PCI_3_Zone. (OK): 31, 37-HD_Cntlr_Zone (OK): 43, 38-I/O_Zone (OK): 31, 39-P/S_2_Zone (OK): 36, 40-Battery_Zone (OK): 33, 41-iLO_Zone (OK): 36, 43-Storage_Batt (OK): 20, 44-Fuse (OK): 33
'No Temperatures' and 'Perfdata' give me this:

Code: Select all

[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -d
ILO2_HEALTH OK - (Board-Version: ILO>=3)  | 01-Inlet_Ambient=16;42;50 13-VR_P1=36;115;120 14-VR_P2=40;115;120 15-VR_P1_Mem=26;115;120 16-VR_P1_Mem=29;115;120 17-VR_P2_Mem=33;115;120 18-VR_P2_Mem=28;115;120 31-PCI_1_Zone.=29;70;75 32-PCI_2_Zone.=31;70;75 33-PCI_3_Zone.=32;70;75 38-I/O_Zone=31;75;80 40-Battery_Zone=33;75;80 41-iLO_Zone=36;90;95
So for "Power Redundancy" for example, how would I set up the Service to show Green or Red if I do or don't have redundant power? -o is the switch for Power Redundancy. Running it from the shell I get the below three outputs:

Code: Select all

[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -d -o
ILO2_HEALTH OK - (Board-Version: ILO>=3)  | 01-Inlet_Ambient=16;42;50 13-VR_P1=36;115;120 14-VR_P2=41;115;120 15-VR_P1_Mem=26;115;120 16-VR_P1_Mem=29;115;120 17-VR_P2_Mem=33;115;120 18-VR_P2_Mem=28;115;120 31-PCI_1_Zone.=30;70;75 32-PCI_2_Zone.=31;70;75 33-PCI_3_Zone.=32;70;75 38-I/O_Zone=31;75;80 40-Battery_Zone=33;75;80 41-iLO_Zone=36;90;95
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -o
ILO2_HEALTH OK - (Board-Version: ILO>=3)
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -o
ILO2_HEALTH OK - (Board-Version: ILO>=3) Temperatures: 01-Inlet_Ambient (OK): 16, 02-CPU_1 (OK): 40, 03-CPU_2 (OK): 40, 05-P1_DIMM_7-12 (OK): 31, 07-P2_DIMM_7-12 (OK): 28, 08-HD_Max (OK): 40, 10-Chipset (OK): 42, 11-PS_1_Inlet (OK): 30, 12-PS_2_Inlet (OK): 30, 13-VR_P1 (OK): 36, 14-VR_P2 (OK): 40, 15-VR_P1_Mem (OK): 26, 16-VR_P1_Mem (OK): 29, 17-VR_P2_Mem (OK): 34, 18-VR_P2_Mem (OK): 29, 19-PS_1_Internal (OK): 40, 20-PS_2_Internal (OK): 40, 27-HD_Controller (OK): 59, 29-LOM (OK): 42, 30-Front_Ambient (OK): 25, 31-PCI_1_Zone. (OK): 29, 32-PCI_2_Zone. (OK): 31, 33-PCI_3_Zone. (OK): 32, 37-HD_Cntlr_Zone (OK): 42, 38-I/O_Zone (OK): 31, 39-P/S_2_Zone (OK): 35, 40-Battery_Zone (OK): 33, 41-iLO_Zone (OK): 36, 43-Storage_Batt (OK): 20, 44-Fuse (OK): 33
Same question for the other metrics - how would I get XI to show the Ambient Temperature, with the current value, and nothing else? For example, using the Configuration Wizard I was able to monitor a WatchGuard firewall, and it shows Active Connections, with a Status Information of "OK - Active Connections 378". I'd like to have check_ilo_health show a service "Ambient Temp" with Status Information of "OK - Temperature 20"

I hope that's not all too confusing for a first post, and I'll upload some screen shots if that helps. Thanks for any help.

Re: Help using xi for a newbie - check_ilo_health

Posted: Mon Oct 28, 2019 1:49 pm
by benjaminsmith
Hello Elliot,
I'm currently on the free trial of XI and if I can get it set up I can ask my client to purchase.
As a trial customer you can register for a QuickStart session with a tech support engineer. If interested, just fill out the form on the page below.
Nagios Quickstarts

I believe the issue you are having, is this plugin will return a warning for high temperate, fan failure or overall health failure and is not written to monitor specific metrics.

Try testing out the plugin below as it might be more suited to those metrics for temperature and power status and let us know if you have any questions.
check_proliant.py

Re: Help using xi for a newbie - check_ilo_health

Posted: Mon Oct 28, 2019 5:00 pm
by elliotK
Thanks a lot for the tip Benjamin, I'll try check_proliant.sy plugin.

I've already had a quickstart, but that was trying to get a plugin called check_hp working. I think that one needs the mib's and oid's.

Cheers,
Elliot

Re: Help using xi for a newbie - check_ilo_health

Posted: Tue Oct 29, 2019 2:50 pm
by benjaminsmith
Hi Elliot,
Thanks a lot for the tip Benjamin, I'll try check_proliant.sy plugin.
I've already had a quickstart, but that was trying to get a plugin called check_hp working
Sounds good. Let me know if you have any other questions.

Re: Help using xi for a newbie - check_ilo_health

Posted: Sun Nov 03, 2019 7:17 am
by elliotK
Hi Benjamin,

I've tried setting up check_proliant.py but not having any luck. I don't think I understand how it is supposed to be used - are you able to explain it? My searches have shown people using it with NRPE, but there's nothing about this from the plugin itself. It says it checks the HPASM/HPLog for the hardware information, but which server are these logs on, and how does the plugin know which server?

The usage is: check_proliant.py --type={fan|ps|temp|dimm|proc|all}

Thanks for your help.

Code: Select all

[root@nagios libexec]# ./check_proliant.py --type=all
UNKNOWN: Error in pexpect while running hpasmcli

Code: Select all

[root@nagios libexec]# ./check_proliant.py --help
check_proliant.py - GPL Python Script by Jason Antman
http://www.jasonantman.com
checks hplog and returns values for use by Nagios

Usage:
check_hplog.py --type=[fan|ps|temp|proc|dimm|all] [--ignore-redundant] [-h | --help]
   type:           what information to get - fan, ps, temp, proc, dimm, all
   -h --help:      print this usage summary

Re: Help using xi for a newbie - check_ilo_health

Posted: Mon Nov 04, 2019 4:37 pm
by benjaminsmith
Hello Elliot,

There are a couple of layers to getting this setup. One is just how the plugin will be called by Nagios, and the second one is getting this particular plugin to run successfully on the server.

I would recommend getting the plugin to work locally and test it to make sure it does what you want it to. Once that's determined you can choose to either install NRPE or setup monitoring via ssh.

Since the plugin is installed on the Proliant server, there has to be a way for Nagios to run it. You can use SSH or install the NRPE agent and call the plugin from Nagios using check_nrpe.

For this plugin to run on the server, it looks like there are a few things to setup.

1. Install the pexpect module for python. You can see the import requirements in the script.
# This script requires the pexpect module which is Expect implemented in pure python.
# it can be obtained from: http://www.noah.org/wiki/Pexpect (or SourceForge.net)
2. And then modify the sudoers file.
NOTE: Nagios must have the ability to run hpasmcli as root via sudo. You can do this
# by adding the following line to /etc/sudoers, assuming the nagios user is 'nagios':
# nagios ALL=(ALL) NOPASSWD: /sbin/hpasmcli
Let me know if you're able to get the plugin to run successfully on the Proliant Server.

Re: Help using xi for a newbie - check_ilo_health

Posted: Tue Nov 05, 2019 12:09 am
by elliotK
Thanks Benjamin, I'm still confused by that further information.

I saw the requirements for installing pexpect and editing the sudoers file, and I did both successfully on the Nagios XI virtual appliance. The Nagios XI server is running on VMware on the host proliant server. I also have another proliant server with Windows server 2016 installed.

The plugin is installed on the Nagios XI server using the Nagios XI plugin manager.

I was using SSH from a Windows laptop to run the plugin and test on the XI server.

Are you saying I need to install the plugin on the ESX operating system and the Windows server operating system? Is that possible?

Thanks a lot,
Elliot

Re: Help using xi for a newbie - check_ilo_health

Posted: Tue Nov 05, 2019 2:44 pm
by scottwilkerson
Can you show the output of the following

Code: Select all

grep hpasmcli /etc/sudoers

Re: Help using xi for a newbie - check_ilo_health

Posted: Wed Nov 06, 2019 1:37 pm
by gormank
I saw above there was a question on getting ilo ambient temp. Here's an example the OP can use to grab it. Just use CCM to make a service to check and add thresholds.

/usr/local/nagios/libexec/check_snmp -H xxxx -o cpqHeTemperatureCelsius.0.1 -C public -P 2c -m CPQHLTH-MIB'

Re: Help using xi for a newbie - check_ilo_health

Posted: Wed Nov 06, 2019 1:52 pm
by scottwilkerson
gormank wrote:I saw above there was a question on getting ilo ambient temp. Here's an example the OP can use to grab it. Just use CCM to make a service to check and add thresholds.

Code: Select all

/usr/local/nagios/libexec/check_snmp -H xxxx -o cpqHeTemperatureCelsius.0.1 -C public  -P 2c -m CPQHLTH-MIB'
Thanks for offering an alternative