Page 1 of 1
check_hp plugin
Posted: Mon Jul 21, 2014 5:36 am
by klee
Hi,
I'm trying to configure this check_hp plugin to work with NagiosXI.
http://exchange.nagios.org/directory/Pl ... hp/details
So far, I've done the following step below. However, I'm not sure how to proceed as there is no documentation. Would you guys be able to give me some assistance?
1) Configured SNMP communication between Linux server and Windows Client.
2) Tested check_hp script
[root@nagiostest1 check_hp-2.16]# ./check_hp -H 192.x.x.x -C public
Compaq/HP Agent Check: overall system state OK
3) Copied the check_hp script into /usr/local/nagios/libexec. Then defined check_hp as a command in CCM as $USER1$/check_hp
Thank you as always,
-klee
Re: check_hp plugin
Posted: Mon Jul 21, 2014 11:57 am
by abrist
Well, you have tested the script and created a command. The last thing you need to do is create a service check in the ccm for it. See the following doc:
http://assets.nagios.com/downloads/nagi ... ios-XI.pdf
Re: check_hp plugin
Posted: Tue Jul 22, 2014 12:51 pm
by klee
Thanks Abrist,
I defined the host and created a service for check_hp pluggin.
define command {
command_name check_hp
command_line $USERS1$/check_hp -H $HOSTADDRESS$ -C $ARG1$ -d
}
Now I'm getting this error:
Jul 22 12:50:24 Nagios1 nagios: Warning: Return code of 127 for check of service 'Check HP with HP Insight Manager' on host '192.x.x.x' was out of bounds. Make sure the plugin you're trying to run actually exists
Re: check_hp plugin
Posted: Tue Jul 22, 2014 7:29 pm
by belvdr
The command line variable is $USER1$ (singular, not plural).
Mine is configured as:
Code: Select all
$USER1$/check_hp -H $HOSTADDRESS$ -C $ARG1$ $ARG2$ $ARG3$ $ARG4$
and I call it with:
Code: Select all
$ARG1$ = Community String
$ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus
Re: check_hp plugin
Posted: Tue Jul 22, 2014 8:08 pm
by klee
Thank you belvdr.
I did indeed mistype $USER$ as plural

I shall using your $ARG1$ $ARG2$ format and see if it works.
I do have a question though: when the CHECK_HP script is run correctly, it returns "Compaq/HP Agent Check: overall system state OK".
If that is the case, why do we have to check components individually by using: $ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus ?
... and if we can delimit multiple components using commas, why do we need the additional $ARG3$ $ARG4$ ?
Much appreciated.
-klee
Re: check_hp plugin
Posted: Wed Jul 23, 2014 10:19 am
by lmiltchev
... and if we can delimit multiple components using commas, why do we need the additional $ARG3$ $ARG4$ ?
You can use $ARG3$ and $ARG4$ to pass some other flags (port, timeout, etc.)...
Re: check_hp plugin
Posted: Wed Jul 23, 2014 10:31 am
by klee
Thank lmiltchev,
Any idea on part 1 of my question?
...when the CHECK_HP script is run correctly, it returns "Compaq/HP Agent Check: overall system state OK".
If that is the case, why do we have to check components individually by using: $ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus ?
Also, is it possible to get metrics on a more granular level (i.e. status of individual components), rather than of a blanket "Compaq/HP Agent Check: overall system state OK"?
Because, right now, I'm checking all of the following components since"./check_hp --help" claims they're supported. The check reports "overall system state OK"; meanwhile there's no tape drive installed on this server.
Currently the module supports the following components:
cpqHeThermalCpuFanStatus,
cpqNicIfLogMapStatus,
cpqHeFltTolFanCondition,
cpqDaLogDrvStatus,
cpqDaLogDrvCondition,
cpqDaTapeDrvStatus,
cpqHeFltTolPwrSupplyCondition,
cpqHeResilientMemCondition,
cpqNicIfPhysAdapterStatus,
cpqRackPowerSupplyCondition,
cpqHeFltTolPowerSupplyCondition,
cpqDaPhyDrvStatus,
cpqHeEventLogCondition,
cpqDaPhyDrvCondition,
cpqFcaHostCntlrStatus,
cpqSeCpuStatus,
cpqHeTemperatureCondition,
cpqHeThermalSystemFanStatus,
cpqDaPhyDrvSmartStatus,
cpqDaCntlrCondition,
cpqRackCommonEnclosureFanCondition
Any assistance would be much appreciated.
Thanks Again,
-klee
Re: check_hp plugin
Posted: Wed Jul 23, 2014 11:54 am
by klee
Just a follow up to my previous comment. So I just ran the debug option and got the result below, which is really more of what I'm looking for.
I will, of course, remove the unrecognized hardware.
./check_hp -H 192.x.x.x -C public -d cpqHeThermalCpuFanStatus,cpqNicIfLogMapStatus,cpqHeFltTolFanCondition,cpqDaLogDrvStatus,cpqDaLogDrvCondition,cpqDaTapeDrvStatus,cpqHeFltTolPwrSupplyCondition,
cpqHeResilientMemCondition,cpqNicIfPhysAdapterStatus,cpqRackPowerSupplyCondition,cpqHeFltTolPowerSupplyCondition,cpqDaPhyDrvStatus,cpqHeEventLogCondition,
cpqDaPhyDrvCondition,cpqFcaHostCntlrStatus,cpqSeCpuStatus,cpqHeTemperatureCondition,cpqHeThermalSystemFanStatus,cpqDaPhyDrvSmartStatus,cpqDaCntlrCondition,
cpqRackCommonEnclosureFanCondition
Compaq/HP Agent Check:
cpqHeThermalCpuFanStatus.0 = 1 status of the fan(s) (other)
cpqNicIfLogMapStatus.2 = 2 status of the NIC logical group (ok)
cpqNicIfLogMapStatus.1 = 1 status of the NIC logical group (unknown)
cpqHeFltTolFanCondition.0.4 = 2 condition of the fan (0.4:ok)
cpqHeFltTolFanCondition.0.2 = 2 condition of the fan (0.2:ok)
cpqHeFltTolFanCondition.0.1 = 2 condition of the fan (0.1:ok)
cpqHeFltTolFanCondition.0.5 = 2 condition of the fan (0.5:ok)
cpqHeFltTolFanCondition.0.6 = 2 condition of the fan (0.6:ok)
cpqHeFltTolFanCondition.0.3 = 2 condition of the fan (0.3:ok)
cpqDaLogDrvStatus.2.1 = 2 logical drive status (2.1:ok)
cpqDaLogDrvCondition.2.1 = 2 logical drive and associated physical state (2.1:ok)
cpqDaTapeDrvStatus tape drive status - 1.3.6.1.4.1.232.3.2.9.1.1.8 (OID-tree not found, ignoring)
cpqHeFltTolPwrSupplyCondition.0 = 2 overall condition of power supply subsystem (ok)
cpqHeResilientMemCondition.0 = 2 condition of the memory protection subsystem (ok)
cpqNicIfPhysAdapterStatus.1 = 2 physical adapter status (ok)
cpqNicIfPhysAdapterStatus.2 = 2 physical adapter status (ok)
cpqRackPowerSupplyCondition condition of the power supply - 1.3.6.1.4.1.232.22.2.5.1.1.1.17 (OID-tree not found, ignoring)
cpqHeFltTolPowerSupplyCondition.0.2 = 2 condition of the power supply (0.2:ok)
cpqHeFltTolPowerSupplyCondition.0.1 = 2 condition of the power supply (0.1:ok)
cpqDaPhyDrvStatus.2.0 = 2 physical drive status (2.0:ok)
cpqDaPhyDrvStatus.2.1 = 2 physical drive status (2.1:ok)
cpqDaPhyDrvStatus.2.2 = 2 physical drive status (2.2:ok)
cpqHeEventLogCondition overall IML entries - 1.3.6.1.4.1.232.6.2.11.2.0 (OID-tree not found, ignoring)
cpqDaPhyDrvCondition.2.1 = 2 physical drive condition (2.1:ok)
cpqDaPhyDrvCondition.2.2 = 2 physical drive condition (2.2:ok)
cpqDaPhyDrvCondition.2.0 = 2 physical drive condition (2.0:ok)
cpqFcaHostCntlrStatus fibre channel host controller status - 1.3.6.1.4.1.232.16.2.7.1.1.4 (OID-tree not found, ignoring)
cpqSeCpuStatus.1 = 2 CPU status (ok)
cpqSeCpuStatus.0 = 2 CPU status (ok)
cpqHeTemperatureCondition.0.20 = 2 temperature sensor condition (0.20:ok)
cpqHeTemperatureCondition.0.10 = 2 temperature sensor condition (0.10:ok)
cpqHeTemperatureCondition.0.3 = 2 temperature sensor condition (0.3:ok)
cpqHeTemperatureCondition.0.8 = 2 temperature sensor condition (0.8:ok)
cpqHeTemperatureCondition.0.30 = 2 temperature sensor condition (0.30:ok)
cpqHeTemperatureCondition.0.21 = 2 temperature sensor condition (0.21:ok)
cpqHeTemperatureCondition.0.25 = 2 temperature sensor condition (0.25:ok)
cpqHeTemperatureCondition.0.12 = 2 temperature sensor condition (0.12:ok)
cpqHeTemperatureCondition.0.1 = 2 temperature sensor condition (0.1:ok)
cpqHeTemperatureCondition.0.9 = 2 temperature sensor condition (0.9:ok)
cpqHeTemperatureCondition.0.2 = 2 temperature sensor condition (0.2:ok)
cpqHeTemperatureCondition.0.23 = 2 temperature sensor condition (0.23:ok)
cpqHeTemperatureCondition.0.19 = 2 temperature sensor condition (0.19:ok)
cpqHeTemperatureCondition.0.4 = 2 temperature sensor condition (0.4:ok)
cpqHeTemperatureCondition.0.7 = 2 temperature sensor condition (0.7:ok)
cpqHeTemperatureCondition.0.6 = 2 temperature sensor condition (0.6:ok)
cpqHeTemperatureCondition.0.29 = 2 temperature sensor condition (0.29:ok)
cpqHeTemperatureCondition.0.26 = 2 temperature sensor condition (0.26:ok)
cpqHeTemperatureCondition.0.22 = 2 temperature sensor condition (0.22:ok)
cpqHeTemperatureCondition.0.5 = 2 temperature sensor condition (0.5:ok)
cpqHeTemperatureCondition.0.11 = 2 temperature sensor condition (0.11:ok)
cpqHeTemperatureCondition.0.24 = 2 temperature sensor condition (0.24:ok)
cpqHeThermalSystemFanStatus.0 = 2 status of the processor fan(s) (ok)
cpqDaPhyDrvSmartStatus.2.2 = 2 physical drive S.M.A.R.T status (2.2:ok)
cpqDaPhyDrvSmartStatus.2.0 = 2 physical drive S.M.A.R.T status (2.0:ok)
cpqDaPhyDrvSmartStatus.2.1 = 2 physical drive S.M.A.R.T status (2.1:ok)
cpqDaCntlrCondition.2 = 2 controller status (ok)
cpqRackCommonEnclosureFanCondition condition of the rack fan - 1.3.6.1.4.1.232.22.2.3.1.3.1.11 (OID-tree not found, ignoring)
This brings us to CHECK_HP Author, Günther Mair's statement below.
Please do not misread the "-d" parameter! The "-d" parameter stands for "DEBUG" and is not intended for production use inside Nagios! check_hp will give you information about which objects failed if there are any.
Can anyone actually attest to how this monitor is supposed to be run. As I mentioned earlier, I was hoping the monitor would report more detail than just "overall system state OK".
However, if that is the standard, then I'll have to be OK with that. Sorry, but I'm a total new to this

Any advice would be greatly appreciated.
-klee
Re: check_hp plugin
Posted: Wed Jul 23, 2014 2:31 pm
by lmiltchev
Can anyone actually attest to how this monitor is supposed to be run. As I mentioned earlier, I was hoping the monitor would report more detail than just "overall system state OK".
What happens when one of the components is in "non-OK" state? Do you get more details then?
As this is a 3rd party plugin, you best bet would be to contact the plugin's author and request more info on the usage.
Re: check_hp plugin
Posted: Thu Jul 24, 2014 10:46 am
by klee
I’ve confirmed with the author of the CHECK_HP plugin that the "overall system state OK" message is the standard return from this script when no problem is found.
If any or more problems are found, you will get the descriptions (in format of debug mode) + respective error codes instead.
Issue resolved, please close thread.
Thanks,
-klee