check_hp plugin

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

check_hp plugin

Post by klee »

Hi,

I'm trying to configure this check_hp plugin to work with NagiosXI. http://exchange.nagios.org/directory/Pl ... hp/details

So far, I've done the following step below. However, I'm not sure how to proceed as there is no documentation. Would you guys be able to give me some assistance?

1) Configured SNMP communication between Linux server and Windows Client.

2) Tested check_hp script
[root@nagiostest1 check_hp-2.16]# ./check_hp -H 192.x.x.x -C public
Compaq/HP Agent Check: overall system state OK

3) Copied the check_hp script into /usr/local/nagios/libexec. Then defined check_hp as a command in CCM as $USER1$/check_hp

Thank you as always,

-klee
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_hp plugin

Post by abrist »

Well, you have tested the script and created a command. The last thing you need to do is create a service check in the ccm for it. See the following doc:
http://assets.nagios.com/downloads/nagi ... ios-XI.pdf
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

Re: check_hp plugin

Post by klee »

Thanks Abrist,

I defined the host and created a service for check_hp pluggin.

define command {
command_name check_hp
command_line $USERS1$/check_hp -H $HOSTADDRESS$ -C $ARG1$ -d
}

Now I'm getting this error:

Jul 22 12:50:24 Nagios1 nagios: Warning: Return code of 127 for check of service 'Check HP with HP Insight Manager' on host '192.x.x.x' was out of bounds. Make sure the plugin you're trying to run actually exists
belvdr
Posts: 81
Joined: Tue Oct 08, 2013 9:17 pm

Re: check_hp plugin

Post by belvdr »

The command line variable is $USER1$ (singular, not plural).

Mine is configured as:

Code: Select all

$USER1$/check_hp -H $HOSTADDRESS$ -C $ARG1$ $ARG2$ $ARG3$ $ARG4$
and I call it with:

Code: Select all

$ARG1$ = Community String
$ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

Re: check_hp plugin

Post by klee »

Thank you belvdr.

I did indeed mistype $USER$ as plural :oops: I shall using your $ARG1$ $ARG2$ format and see if it works.

I do have a question though: when the CHECK_HP script is run correctly, it returns "Compaq/HP Agent Check: overall system state OK".

If that is the case, why do we have to check components individually by using: $ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus ?

... and if we can delimit multiple components using commas, why do we need the additional $ARG3$ $ARG4$ ?

Much appreciated.

-klee
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: check_hp plugin

Post by lmiltchev »

... and if we can delimit multiple components using commas, why do we need the additional $ARG3$ $ARG4$ ?
You can use $ARG3$ and $ARG4$ to pass some other flags (port, timeout, etc.)...
Be sure to check out our Knowledgebase for helpful articles and solutions!
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

Re: check_hp plugin

Post by klee »

Thank lmiltchev,

Any idea on part 1 of my question?
...when the CHECK_HP script is run correctly, it returns "Compaq/HP Agent Check: overall system state OK".

If that is the case, why do we have to check components individually by using: $ARG2$ = -x cpqFcaHostCntlrStatus,cpqNicIfPhysAdapterStatus ?
Also, is it possible to get metrics on a more granular level (i.e. status of individual components), rather than of a blanket "Compaq/HP Agent Check: overall system state OK"?

Because, right now, I'm checking all of the following components since"./check_hp --help" claims they're supported. The check reports "overall system state OK"; meanwhile there's no tape drive installed on this server.
Currently the module supports the following components:
cpqHeThermalCpuFanStatus,
cpqNicIfLogMapStatus,
cpqHeFltTolFanCondition,
cpqDaLogDrvStatus,
cpqDaLogDrvCondition,
cpqDaTapeDrvStatus,
cpqHeFltTolPwrSupplyCondition,
cpqHeResilientMemCondition,
cpqNicIfPhysAdapterStatus,
cpqRackPowerSupplyCondition,
cpqHeFltTolPowerSupplyCondition,
cpqDaPhyDrvStatus,
cpqHeEventLogCondition,
cpqDaPhyDrvCondition,
cpqFcaHostCntlrStatus,
cpqSeCpuStatus,
cpqHeTemperatureCondition,
cpqHeThermalSystemFanStatus,
cpqDaPhyDrvSmartStatus,
cpqDaCntlrCondition,
cpqRackCommonEnclosureFanCondition
Any assistance would be much appreciated.

Thanks Again,

-klee
Last edited by klee on Wed Jul 23, 2014 2:07 pm, edited 1 time in total.
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

Re: check_hp plugin

Post by klee »

Just a follow up to my previous comment. So I just ran the debug option and got the result below, which is really more of what I'm looking for.
I will, of course, remove the unrecognized hardware.
./check_hp -H 192.x.x.x -C public -d cpqHeThermalCpuFanStatus,cpqNicIfLogMapStatus,cpqHeFltTolFanCondition,cpqDaLogDrvStatus,cpqDaLogDrvCondition,cpqDaTapeDrvStatus,cpqHeFltTolPwrSupplyCondition,
cpqHeResilientMemCondition,cpqNicIfPhysAdapterStatus,cpqRackPowerSupplyCondition,cpqHeFltTolPowerSupplyCondition,cpqDaPhyDrvStatus,cpqHeEventLogCondition,
cpqDaPhyDrvCondition,cpqFcaHostCntlrStatus,cpqSeCpuStatus,cpqHeTemperatureCondition,cpqHeThermalSystemFanStatus,cpqDaPhyDrvSmartStatus,cpqDaCntlrCondition,
cpqRackCommonEnclosureFanCondition

Compaq/HP Agent Check:
cpqHeThermalCpuFanStatus.0 = 1 status of the fan(s) (other)
cpqNicIfLogMapStatus.2 = 2 status of the NIC logical group (ok)
cpqNicIfLogMapStatus.1 = 1 status of the NIC logical group (unknown)
cpqHeFltTolFanCondition.0.4 = 2 condition of the fan (0.4:ok)
cpqHeFltTolFanCondition.0.2 = 2 condition of the fan (0.2:ok)
cpqHeFltTolFanCondition.0.1 = 2 condition of the fan (0.1:ok)
cpqHeFltTolFanCondition.0.5 = 2 condition of the fan (0.5:ok)
cpqHeFltTolFanCondition.0.6 = 2 condition of the fan (0.6:ok)
cpqHeFltTolFanCondition.0.3 = 2 condition of the fan (0.3:ok)
cpqDaLogDrvStatus.2.1 = 2 logical drive status (2.1:ok)
cpqDaLogDrvCondition.2.1 = 2 logical drive and associated physical state (2.1:ok)
cpqDaTapeDrvStatus tape drive status - 1.3.6.1.4.1.232.3.2.9.1.1.8 (OID-tree not found, ignoring)
cpqHeFltTolPwrSupplyCondition.0 = 2 overall condition of power supply subsystem (ok)
cpqHeResilientMemCondition.0 = 2 condition of the memory protection subsystem (ok)
cpqNicIfPhysAdapterStatus.1 = 2 physical adapter status (ok)
cpqNicIfPhysAdapterStatus.2 = 2 physical adapter status (ok)
cpqRackPowerSupplyCondition condition of the power supply - 1.3.6.1.4.1.232.22.2.5.1.1.1.17 (OID-tree not found, ignoring)
cpqHeFltTolPowerSupplyCondition.0.2 = 2 condition of the power supply (0.2:ok)
cpqHeFltTolPowerSupplyCondition.0.1 = 2 condition of the power supply (0.1:ok)
cpqDaPhyDrvStatus.2.0 = 2 physical drive status (2.0:ok)
cpqDaPhyDrvStatus.2.1 = 2 physical drive status (2.1:ok)
cpqDaPhyDrvStatus.2.2 = 2 physical drive status (2.2:ok)
cpqHeEventLogCondition overall IML entries - 1.3.6.1.4.1.232.6.2.11.2.0 (OID-tree not found, ignoring)
cpqDaPhyDrvCondition.2.1 = 2 physical drive condition (2.1:ok)
cpqDaPhyDrvCondition.2.2 = 2 physical drive condition (2.2:ok)
cpqDaPhyDrvCondition.2.0 = 2 physical drive condition (2.0:ok)
cpqFcaHostCntlrStatus fibre channel host controller status - 1.3.6.1.4.1.232.16.2.7.1.1.4 (OID-tree not found, ignoring)
cpqSeCpuStatus.1 = 2 CPU status (ok)
cpqSeCpuStatus.0 = 2 CPU status (ok)
cpqHeTemperatureCondition.0.20 = 2 temperature sensor condition (0.20:ok)
cpqHeTemperatureCondition.0.10 = 2 temperature sensor condition (0.10:ok)
cpqHeTemperatureCondition.0.3 = 2 temperature sensor condition (0.3:ok)
cpqHeTemperatureCondition.0.8 = 2 temperature sensor condition (0.8:ok)
cpqHeTemperatureCondition.0.30 = 2 temperature sensor condition (0.30:ok)
cpqHeTemperatureCondition.0.21 = 2 temperature sensor condition (0.21:ok)
cpqHeTemperatureCondition.0.25 = 2 temperature sensor condition (0.25:ok)
cpqHeTemperatureCondition.0.12 = 2 temperature sensor condition (0.12:ok)
cpqHeTemperatureCondition.0.1 = 2 temperature sensor condition (0.1:ok)
cpqHeTemperatureCondition.0.9 = 2 temperature sensor condition (0.9:ok)
cpqHeTemperatureCondition.0.2 = 2 temperature sensor condition (0.2:ok)
cpqHeTemperatureCondition.0.23 = 2 temperature sensor condition (0.23:ok)
cpqHeTemperatureCondition.0.19 = 2 temperature sensor condition (0.19:ok)
cpqHeTemperatureCondition.0.4 = 2 temperature sensor condition (0.4:ok)
cpqHeTemperatureCondition.0.7 = 2 temperature sensor condition (0.7:ok)
cpqHeTemperatureCondition.0.6 = 2 temperature sensor condition (0.6:ok)
cpqHeTemperatureCondition.0.29 = 2 temperature sensor condition (0.29:ok)
cpqHeTemperatureCondition.0.26 = 2 temperature sensor condition (0.26:ok)
cpqHeTemperatureCondition.0.22 = 2 temperature sensor condition (0.22:ok)
cpqHeTemperatureCondition.0.5 = 2 temperature sensor condition (0.5:ok)
cpqHeTemperatureCondition.0.11 = 2 temperature sensor condition (0.11:ok)
cpqHeTemperatureCondition.0.24 = 2 temperature sensor condition (0.24:ok)
cpqHeThermalSystemFanStatus.0 = 2 status of the processor fan(s) (ok)
cpqDaPhyDrvSmartStatus.2.2 = 2 physical drive S.M.A.R.T status (2.2:ok)
cpqDaPhyDrvSmartStatus.2.0 = 2 physical drive S.M.A.R.T status (2.0:ok)
cpqDaPhyDrvSmartStatus.2.1 = 2 physical drive S.M.A.R.T status (2.1:ok)
cpqDaCntlrCondition.2 = 2 controller status (ok)
cpqRackCommonEnclosureFanCondition condition of the rack fan - 1.3.6.1.4.1.232.22.2.3.1.3.1.11 (OID-tree not found, ignoring)
This brings us to CHECK_HP Author, Günther Mair's statement below.
Please do not misread the "-d" parameter! The "-d" parameter stands for "DEBUG" and is not intended for production use inside Nagios! check_hp will give you information about which objects failed if there are any.
Can anyone actually attest to how this monitor is supposed to be run. As I mentioned earlier, I was hoping the monitor would report more detail than just "overall system state OK".

However, if that is the standard, then I'll have to be OK with that. Sorry, but I'm a total new to this :? Any advice would be greatly appreciated.

-klee
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: check_hp plugin

Post by lmiltchev »

Can anyone actually attest to how this monitor is supposed to be run. As I mentioned earlier, I was hoping the monitor would report more detail than just "overall system state OK".
What happens when one of the components is in "non-OK" state? Do you get more details then?
As this is a 3rd party plugin, you best bet would be to contact the plugin's author and request more info on the usage.
Be sure to check out our Knowledgebase for helpful articles and solutions!
klee
Posts: 147
Joined: Fri Apr 04, 2014 2:31 pm

Re: check_hp plugin

Post by klee »

I’ve confirmed with the author of the CHECK_HP plugin that the "overall system state OK" message is the standard return from this script when no problem is found.

If any or more problems are found, you will get the descriptions (in format of debug mode) + respective error codes instead.

Issue resolved, please close thread.

Thanks,

-klee
Locked