Help using xi for a newbie - check_ilo_health
Posted: Sat Oct 26, 2019 12:47 am
Hi all first post and new to Nagios.
I'm not a Linux guy at all and am Googling every step of the way to learn about chmod, outputting a command to a text file, editing with nano, installing pearl modules, etc. I'm currently on the free trial of XI and if I can get it set up I can ask my client to purchase.
I've spent hours trying to figure out how to monitor a physical HP DL380 G9 server via iLO and the plugin check_ilo2_health and its time to ask for help. I gave up on SNMP monitoring as the MIBs and OIDs were leading me down a rabbit hole. I basically want an alert if there is a hardware failure or some temps go above threshold.
I'm actually pretty stoked with where I've got up to, and I think it's all working but I just don't know how to finish off and set up the services.
I'm running XI on the virtual appliance that you can download and install on an ESXi host.
I have an SSH session to the Nagios server (even this is new to me) so I can test the plugin and I'm getting results, but, I can't make head or tail of the switches for the different options, and how to set up the services for the host to show something meaningful.
I set up a Command. This did have variables for the username and password, but I've put them in plain text to make it work as I couldn't figure out where the variables were.
These are the options. I figured out I had to use '-3' to get it to work with iLO4
No switch gives me this:
'No Temperatures' and 'Perfdata' give me this:
So for "Power Redundancy" for example, how would I set up the Service to show Green or Red if I do or don't have redundant power? -o is the switch for Power Redundancy. Running it from the shell I get the below three outputs:
Same question for the other metrics - how would I get XI to show the Ambient Temperature, with the current value, and nothing else? For example, using the Configuration Wizard I was able to monitor a WatchGuard firewall, and it shows Active Connections, with a Status Information of "OK - Active Connections 378". I'd like to have check_ilo_health show a service "Ambient Temp" with Status Information of "OK - Temperature 20"
I hope that's not all too confusing for a first post, and I'll upload some screen shots if that helps. Thanks for any help.
I'm not a Linux guy at all and am Googling every step of the way to learn about chmod, outputting a command to a text file, editing with nano, installing pearl modules, etc. I'm currently on the free trial of XI and if I can get it set up I can ask my client to purchase.
I've spent hours trying to figure out how to monitor a physical HP DL380 G9 server via iLO and the plugin check_ilo2_health and its time to ask for help. I gave up on SNMP monitoring as the MIBs and OIDs were leading me down a rabbit hole. I basically want an alert if there is a hardware failure or some temps go above threshold.
I'm actually pretty stoked with where I've got up to, and I think it's all working but I just don't know how to finish off and set up the services.
I'm running XI on the virtual appliance that you can download and install on an ESXi host.
I have an SSH session to the Nagios server (even this is new to me) so I can test the plugin and I'm getting results, but, I can't make head or tail of the switches for the different options, and how to set up the services for the host to show something meaningful.
I set up a Command. This did have variables for the username and password, but I've put them in plain text to make it work as I couldn't figure out where the variables were.
Code: Select all
$USER1$/check_ilo2_health.pl -u ## -p ##-H $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$Code: Select all
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -?
Usage: check_ilo2_health.pl [-H <host>] [ -u|--user=<USERNAME> ]
[ -p|--password=<PASSWORD> ] [ -f|--inputfile=<filename> ]
[ -a|--fanredundancy ] [ -c|--checkdrives ] [ -d|--perfdata ]
[ -e|--skipsyntaxerrors ] [ -n|--notemperatures ] [ -3|--ilo3 ]
[ -o|--powerredundancy ] [ -b|--locationlabel ] [ -l|--eventlogcheck]
[ -i|--ignorelinkdown ] [ -x|--ignorebatterymissing ] [ -s|--sslv3 ]
[ -t <timeout> ] [ -r <retries> ] [ -g|--getinfos ] [ --sslopts ]
[ -U|--ignorelinkunknown ] [ -v|--verbose ]
Code: Select all
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3
ILO2_HEALTH OK - (Board-Version: ILO>=3) Temperatures: 01-Inlet_Ambient (OK): 16, 02-CPU_1 (OK): 40, 03-CPU_2 (OK): 40, 05-P1_DIMM_7-12 (OK): 32, 07-P2_DIMM_7-12 (OK): 28, 08-HD_Max (OK): 40, 10-Chipset (OK): 42, 11-PS_1_Inlet (OK): 28, 12-PS_2_Inlet (OK): 32, 13-VR_P1 (OK): 37, 14-VR_P2 (OK): 39, 15-VR_P1_Mem (OK): 26, 16-VR_P1_Mem (OK): 29, 17-VR_P2_Mem (OK): 33, 18-VR_P2_Mem (OK): 29, 19-PS_1_Internal (OK): 40, 20-PS_2_Internal (OK): 40, 27-HD_Controller (OK): 60, 29-LOM (OK): 42, 30-Front_Ambient (OK): 25, 31-PCI_1_Zone. (OK): 30, 32-PCI_2_Zone. (OK): 31, 33-PCI_3_Zone. (OK): 31, 37-HD_Cntlr_Zone (OK): 43, 38-I/O_Zone (OK): 31, 39-P/S_2_Zone (OK): 36, 40-Battery_Zone (OK): 33, 41-iLO_Zone (OK): 36, 43-Storage_Batt (OK): 20, 44-Fuse (OK): 33
Code: Select all
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -d
ILO2_HEALTH OK - (Board-Version: ILO>=3) | 01-Inlet_Ambient=16;42;50 13-VR_P1=36;115;120 14-VR_P2=40;115;120 15-VR_P1_Mem=26;115;120 16-VR_P1_Mem=29;115;120 17-VR_P2_Mem=33;115;120 18-VR_P2_Mem=28;115;120 31-PCI_1_Zone.=29;70;75 32-PCI_2_Zone.=31;70;75 33-PCI_3_Zone.=32;70;75 38-I/O_Zone=31;75;80 40-Battery_Zone=33;75;80 41-iLO_Zone=36;90;95Code: Select all
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -d -o
ILO2_HEALTH OK - (Board-Version: ILO>=3) | 01-Inlet_Ambient=16;42;50 13-VR_P1=36;115;120 14-VR_P2=41;115;120 15-VR_P1_Mem=26;115;120 16-VR_P1_Mem=29;115;120 17-VR_P2_Mem=33;115;120 18-VR_P2_Mem=28;115;120 31-PCI_1_Zone.=30;70;75 32-PCI_2_Zone.=31;70;75 33-PCI_3_Zone.=32;70;75 38-I/O_Zone=31;75;80 40-Battery_Zone=33;75;80 41-iLO_Zone=36;90;95
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -n -o
ILO2_HEALTH OK - (Board-Version: ILO>=3)
[root@nagios libexec]# ./check_ilo2_health.pl -H 10.20.11.101 -u ## -p ## -3 -o
ILO2_HEALTH OK - (Board-Version: ILO>=3) Temperatures: 01-Inlet_Ambient (OK): 16, 02-CPU_1 (OK): 40, 03-CPU_2 (OK): 40, 05-P1_DIMM_7-12 (OK): 31, 07-P2_DIMM_7-12 (OK): 28, 08-HD_Max (OK): 40, 10-Chipset (OK): 42, 11-PS_1_Inlet (OK): 30, 12-PS_2_Inlet (OK): 30, 13-VR_P1 (OK): 36, 14-VR_P2 (OK): 40, 15-VR_P1_Mem (OK): 26, 16-VR_P1_Mem (OK): 29, 17-VR_P2_Mem (OK): 34, 18-VR_P2_Mem (OK): 29, 19-PS_1_Internal (OK): 40, 20-PS_2_Internal (OK): 40, 27-HD_Controller (OK): 59, 29-LOM (OK): 42, 30-Front_Ambient (OK): 25, 31-PCI_1_Zone. (OK): 29, 32-PCI_2_Zone. (OK): 31, 33-PCI_3_Zone. (OK): 32, 37-HD_Cntlr_Zone (OK): 42, 38-I/O_Zone (OK): 31, 39-P/S_2_Zone (OK): 35, 40-Battery_Zone (OK): 33, 41-iLO_Zone (OK): 36, 43-Storage_Batt (OK): 20, 44-Fuse (OK): 33
I hope that's not all too confusing for a first post, and I'll upload some screen shots if that helps. Thanks for any help.